Re: Web page source using wget?

2003-10-13 Thread Suhas Tembe
Thanks Hrvoje, using 
http://.../InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query 
in IE worked like a charm. I didn't have to follow links. I am now trying to automate 
this using wget 1.8.2 (Windows).

There are two steps involved:
1). Log in to the customer's web site. I was able to create the following link after I 
looked at the form section in the source as explained to me earlier by Hrvoje.

wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
(United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
Canada)action-Submit=Login

2). Execute: wget 
http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query
 

I tried different ways to get this working, but so far have been unsuccessful. Any 
ideas?

Thanks,
Suhas


- Original Message - 
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, October 07, 2003 6:12 PM
Subject: Re: Web page source using wget?


 Suhas Tembe [EMAIL PROTECTED] writes:
 
  It does look a little complicated This is how it looks:
 
  form action=InventoryStatus.asp method=post [...]
 [...]
  select name=cboSupplier
  option value=4541-134289454A/option
  option value=4542-134289 selected454B/option
  /select
 
 Those are the important parts.  It's not hard to submit this form.
 With Wget 1.9, you can even use the POST method, e.g.:
 
 wget http://.../InventoryStatus.asp --post-data \
  'cboSupplier=4541-134289status=allaction-select=Query' \
  -O InventoryStatus1.asp
 wget http://.../InventoryStatus.asp --post-data \
  'cboSupplier=4542-134289status=allaction-select=Query'
  -O InventoryStatus2.asp
 
 It might even work to simply use GET, and retrieve
 http://.../InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query
 without the need for `--post-data' or `-O', but that depends on the
 ASP script that does the processing.
 
 The harder part is to automate this process for *any* values in the
 drop-down list.  You might need to use an intermediary Perl script
 that extracts all the option value=... from the HTML source of the
 page with the drop-down.  Then, from the output of the Perl script,
 you call Wget as shown above.
 
 It's doable, but it takes some work.  Unfortunately, I don't know of a
 (command-line) tool that would make this easier.
 



Re: Web page source using wget?

2003-10-13 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 There are two steps involved:
 1). Log in to the customer's web site. I was able to create the following link after 
 I looked at the form section in the source as explained to me earlier by Hrvoje.
 wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
 (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
 Canada)action-Submit=Login

Did you add --save-cookies=FILE?  By default Wget will use cookies,
but will not save them to an external file and they will therefore be
lost.

 2). Execute: wget
 http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query

For this step, add --load-cookies=FILE, where FILE is the same file
you specified to --save-cookies above.


Re: Web page source using wget?

2003-10-13 Thread Suhas Tembe
I tried, but it doesn't seem to have worked. This what I did:

wget --save-cookies=cookies.txt 
http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
(United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
Canada)action-Submit=Login

wget --load-cookies=cookies.txt 
http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier=4541-134289status=allaction-select=Query
 --http-user=4542-134289

After executing the above two lines, it creates two files: 
1). [EMAIL PROTECTED] :  I can see that this file contains a message (among other 
things): Your session has expired due to a period of inactivity
2). [EMAIL PROTECTED]

Thanks,
Suhas


- Original Message - 
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 13, 2003 11:37 AM
Subject: Re: Web page source using wget?


 Suhas Tembe [EMAIL PROTECTED] writes:
 
  There are two steps involved:
  1). Log in to the customer's web site. I was able to create the following link 
  after I looked at the form section in the source as explained to me earlier by 
  Hrvoje.
  wget 
  http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
  (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
  Canada)action-Submit=Login
 
 Did you add --save-cookies=FILE?  By default Wget will use cookies,
 but will not save them to an external file and they will therefore be
 lost.
 
  2). Execute: wget
  http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query
 
 For this step, add --load-cookies=FILE, where FILE is the same file
 you specified to --save-cookies above.



Re: Web page source using wget?

2003-10-13 Thread Suhas Tembe
A slight correction the first wget should read:

wget --save-cookies=cookies.txt 
http://customer.website.com/supplyweb/general/default.asp?UserAccount=USERAccessCode=PASSWORDLocale=en-usTimeZone=EST:-300action-Submit=Login

I tried this link in IE, but it it comes back to the same login screen. No errors 
messages are displayed at this point. Am I missing something? I have attached the 
source for the login page.

Thanks,
Suhas


- Original Message - 
From: Suhas Tembe [EMAIL PROTECTED]
To: Hrvoje Niksic [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 13, 2003 11:53 AM
Subject: Re: Web page source using wget?


I tried, but it doesn't seem to have worked. This what I did:

wget --save-cookies=cookies.txt 
http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
(United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
Canada)action-Submit=Login

wget --load-cookies=cookies.txt 
http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier=4541-134289status=allaction-select=Query
 --http-user=4542-134289

After executing the above two lines, it creates two files: 
1). [EMAIL PROTECTED] :  I can see that this file contains a message (among other 
things): Your session has expired due to a period of inactivity
2). [EMAIL PROTECTED]

Thanks,
Suhas


- Original Message - 
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 13, 2003 11:37 AM
Subject: Re: Web page source using wget?


 Suhas Tembe [EMAIL PROTECTED] writes:
 
  There are two steps involved:
  1). Log in to the customer's web site. I was able to create the following link 
  after I looked at the form section in the source as explained to me earlier by 
  Hrvoje.
  wget 
  http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
  (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
  Canada)action-Submit=Login
 
 Did you add --save-cookies=FILE?  By default Wget will use cookies,
 but will not save them to an external file and they will therefore be
 lost.
 
  2). Execute: wget
  http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query
 
 For this step, add --load-cookies=FILE, where FILE is the same file
 you specified to --save-cookies above.




Re: Web page source using wget?

2003-10-13 Thread Jens Rösner
Hi Suhas!

Well, I am by no means an expert, but I think that wget 
closes the connection after the first retrieval. 
The SSL server realizes this and decides that wget has no right to log in 
for the second retrieval, eventhough the cookie is there.
I think that is a correct behaviour for a secure server, isn't it?

Does this make sense? 
Jens


 A slight correction the first wget should read:
 
 wget --save-cookies=cookies.txt 
 http://customer.website.com/supplyweb/general/default.asp?UserAccount=U
 SERAccessCode=PASSWORDLocale=en-usTimeZone=EST:-300action-Submi
 t=Login
 
 I tried this link in IE, but it it comes back to the same login screen. 
 No errors messages are displayed at this point. Am I missing something? 
 I have attached the source for the login page.
 
 Thanks,
 Suhas
 
 
 - Original Message - 
 From: Suhas Tembe [EMAIL PROTECTED]
 To: Hrvoje Niksic [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Monday, October 13, 2003 11:53 AM
 Subject: Re: Web page source using wget?
 
 
 I tried, but it doesn't seem to have worked. This what I did:
 
 wget --save-cookies=cookies.txt 
 http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca
 le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time 
 (USA amp; Canada)action-Submit=Login
 
 wget --load-cookies=cookies.txt 
 http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier
 =4541-134289status=allaction-select=Query 
 --http-user=4542-134289
 
 After executing the above two lines, it creates two files: 
 1). [EMAIL PROTECTED] :  I can see that 
 this file contains a message (among other things): Your session has 
 expired due to a period of inactivity
 2). [EMAIL PROTECTED]
 
 Thanks,
 Suhas
 
 
 - Original Message - 
 From: Hrvoje Niksic [EMAIL PROTECTED]
 To: Suhas Tembe [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Monday, October 13, 2003 11:37 AM
 Subject: Re: Web page source using wget?
 
 
  Suhas Tembe [EMAIL PROTECTED] writes:
  
   There are two steps involved:
   1). Log in to the customer's web site. I was able to create the 
 following link after I looked at the form section in the source as 
 explained to me earlier by Hrvoje.
   wget 
 http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca
 le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time 
 (USA amp; Canada)action-Submit=Login
  
  Did you add --save-cookies=FILE?  By default Wget will use cookies,
  but will not save them to an external file and they will therefore be
  lost.
  
   2). Execute: wget
   
 http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289
 status=allaction-select=Query
  
  For this step, add --load-cookies=FILE, where FILE is the same file
  you specified to --save-cookies above.
 
 

-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: Web page source using wget?

2003-10-13 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 I tried, but it doesn't seem to have worked. This what I did:

 wget --save-cookies=cookies.txt 
 http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
 (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
 Canada)action-Submit=Login

Hopefully you used quotes to protect the spaces in URLs from the
shell?

After the first command, does `cookies.txt' contains what looks like a
valid cookie?


Re: Web page source using wget?

2003-10-13 Thread Hrvoje Niksic
Jens Rösner [EMAIL PROTECTED] writes:

 Well, I am by no means an expert, but I think that wget closes the
 connection after the first retrieval. The SSL server realizes this
 and decides that wget has no right to log in for the second
 retrieval, eventhough the cookie is there.  I think that is a
 correct behaviour for a secure server, isn't it?

Why would it be correct?  Persistent connections are a mere
optimization; a new connection should work as well as the old one, as
long as the credentials (usually provided by a cookie) are provided.

There are security mechanisms that authorize on a per-connection
basis, and they require new log in for each new connections (I believe
NTLM is like this), but this should not be the case here.

Even if it were the case, you could tell Wget to use the same
connection, like this:

wget http://URL1... http://URL2...

In that case you shouldn't even have to bother with `--save-cookies'
and `--load-cookies'.  But maybe something else is going wrong for
Suhas; I really don't know.


Re: Web page source using wget?

2003-10-13 Thread Suhas Tembe
So, is there a way I can get to the page I want after logging into a secure server 
using wget? Can I keep the SSL connection open for the second retrieval to work?

The other thing I noticed is that the first URL (to log in) does not seem to work, 
because when I use that same URL in IE, it brings me back to the login screen (see 
attached source of the login page). I don't get logged-in. I am not quite sure if it 
is the URL that is incorrect or it is something else.

Thanks,
Suhas


- Original Message - 
From: Jens Rösner [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 13, 2003 12:51 PM
Subject: Re: Web page source using wget?


 Hi Suhas!
 
 Well, I am by no means an expert, but I think that wget 
 closes the connection after the first retrieval. 
 The SSL server realizes this and decides that wget has no right to log in 
 for the second retrieval, eventhough the cookie is there.
 I think that is a correct behaviour for a secure server, isn't it?
 
 Does this make sense? 
 Jens
 
 
  A slight correction the first wget should read:
  
  wget --save-cookies=cookies.txt 
  http://customer.website.com/supplyweb/general/default.asp?UserAccount=U
  SERAccessCode=PASSWORDLocale=en-usTimeZone=EST:-300action-Submi
  t=Login
  
  I tried this link in IE, but it it comes back to the same login screen. 
  No errors messages are displayed at this point. Am I missing something? 
  I have attached the source for the login page.
  
  Thanks,
  Suhas
  
  
  - Original Message - 
  From: Suhas Tembe [EMAIL PROTECTED]
  To: Hrvoje Niksic [EMAIL PROTECTED]
  Cc: [EMAIL PROTECTED]
  Sent: Monday, October 13, 2003 11:53 AM
  Subject: Re: Web page source using wget?
  
  
  I tried, but it doesn't seem to have worked. This what I did:
  
  wget --save-cookies=cookies.txt 
  http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca
  le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time 
  (USA amp; Canada)action-Submit=Login
  
  wget --load-cookies=cookies.txt 
  http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier
  =4541-134289status=allaction-select=Query 
  --http-user=4542-134289
  
  After executing the above two lines, it creates two files: 
  1). [EMAIL PROTECTED] :  I can see that 
  this file contains a message (among other things): Your session has 
  expired due to a period of inactivity
  2). [EMAIL PROTECTED]
  
  Thanks,
  Suhas
  
  
  - Original Message - 
  From: Hrvoje Niksic [EMAIL PROTECTED]
  To: Suhas Tembe [EMAIL PROTECTED]
  Cc: [EMAIL PROTECTED]
  Sent: Monday, October 13, 2003 11:37 AM
  Subject: Re: Web page source using wget?
  
  
   Suhas Tembe [EMAIL PROTECTED] writes:
   
There are two steps involved:
1). Log in to the customer's web site. I was able to create the 
  following link after I looked at the form section in the source as 
  explained to me earlier by Hrvoje.
wget 
  http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca
  le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time 
  (USA amp; Canada)action-Submit=Login
   
   Did you add --save-cookies=FILE?  By default Wget will use cookies,
   but will not save them to an external file and they will therefore be
   lost.
   
2). Execute: wget

  http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289
  status=allaction-select=Query
   
   For this step, add --load-cookies=FILE, where FILE is the same file
   you specified to --save-cookies above.
  
  
 
 -- 
 NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
 Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService
 
 Jetzt kostenlos anmelden unter http://www.gmx.net
 
 +++ GMX - die erste Adresse für Mail, Message, More! +++
 
html xmlns:bml=urn:brainna.com:bml:2002
head
META http-equiv=Content-Type content=text/html; charset=ISO-8859-1
titleSupplyWEB Login/title
/headscript language=JavaScript1.1 type=text/javascript
var amSymbol = AM;
var pmSymbol = PM;
var negativeSymbol = -;

var dateSeparator = /;
var dateFormat = M/dd/;

var timeSeparator = :;
var timeFormat = h:mm:ss t;

var decimalSeparator = .;

function setIcon(icon, required, valid) {
  if (!valid) {
icon.alt = X;
icon.src = ../images/error.gif;
  } else if (required) {
icon.alt = *;
icon.src = ../images/required.gif;
  } else {
icon.alt =  ;
icon.src = ../images/blank.gif;
  }
}

function login_UserAccount_validate() {
  var valid = true;
  setIcon(document.login.UserAccount_icon, true, valid);
  return valid;
}

function login_AccessCode_validate() {
  var valid = true;
  setIcon(document.login.AccessCode_icon, true, valid);
  return valid;
}

function login_Locale_validate() {
  var valid = true;
  if (valid) valid = login_Locale_custom_validate(document.login.Locale);
  setIcon(document.login.Locale_icon, true, valid);
  return valid;
}

function

Re: Web page source using wget?

2003-10-13 Thread Suhas Tembe
Cookies.txt looks like this:

# HTTP cookie file.
# Generated by Wget on 2003-10-13 13:19:26.
# Edit at your own risk.

There is nothing after the 3rd line. So, it doesn't look like a valid cookie file.


- Original Message - 
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 13, 2003 12:57 PM
Subject: Re: Web page source using wget?


 Suhas Tembe [EMAIL PROTECTED] writes:
 
  I tried, but it doesn't seem to have worked. This what I did:
 
  wget --save-cookies=cookies.txt 
  http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English 
  (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; 
  Canada)action-Submit=Login
 
 Hopefully you used quotes to protect the spaces in URLs from the
 shell?
 
 After the first command, does `cookies.txt' contains what looks like a
 valid cookie?



Re: Web page source using wget?

2003-10-13 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 Cookies.txt looks like this:

 # HTTP cookie file.
 # Generated by Wget on 2003-10-13 13:19:26.
 # Edit at your own risk.

 There is nothing after the 3rd line. So, it doesn't look like a
 valid cookie file.

It's valid all right, but there are no cookies inside.  The thing is,
Wget will only save cookies that are marked as permanent through an
expiry date in the future.  Currently there is no way to force saving
non-permanent cookies.

You can, however, run both URLs in the same Wget invocation by
providing them both on the command line.  That way cookies should be
shared.


Re: Web page source using wget?

2003-10-13 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 The other thing I noticed is that the first URL (to log in) does not
 seem to work, because when I use that same URL in IE, it brings me
 back to the login screen (see attached source of the login
 page). I don't get logged-in.

Why are you using that URL if it is confirmed that it doesn't work?

The form tag in the login script specifies the POST method.
Therefore it is quite possible that the login script requires the use
of POST.  If that is the case, you'll need to get Wget 1.9-beta and
provide login information with the `--post-data' option.

I'm sorry I don't have better news for you.  Web services can be a
real pain.


Re: Web page source using wget?

2003-10-13 Thread Jens Rösner
Hi Hrvoje!

  retrieval, eventhough the cookie is there.  I think that is a
  correct behaviour for a secure server, isn't it?
 Why would it be correct?  
Sorry, I seem to have been misled by my own (limited) experience:
From the few secure sites I use, most will not let you 
log in again after you closed and restarted your browser or redialed 
your connection. That's what reminded my of Suhas' problem.

 Even if it were the case, you could tell Wget to use the same
 connection, like this:
 wget http://URL1... http://URL2...
Right, I always forget that, thanks!

Cya
Jens



-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: Web page source using wget?

2003-10-13 Thread Hrvoje Niksic
Jens Rösner [EMAIL PROTECTED] writes:

 Hi Hrvoje!

  retrieval, eventhough the cookie is there.  I think that is a
  correct behaviour for a secure server, isn't it?
 Why would it be correct?  
 Sorry, I seem to have been misled by my own (limited) experience:
From the few secure sites I use, most will not let you 
 log in again after you closed and restarted your browser

That merely means that the cookie is marked non-permanent -- which is
probably the case here as well.  A site that banned reconnecting would
effectively ban all HTTP/1.0 browsers, which would probably be going
too far.



Re: Web page source using wget?

2003-10-07 Thread Suhas Tembe
Thanks everyone for the replies so far..

The problem I am having is that the customer is using ASP  Java script. The URL stays 
the same as I click through the links. So, using wget URL for the page I want may 
not work (I may be wrong). Any suggestions on how I can tackle this?

Thanks,
Suhas

- Original Message - 
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 06, 2003 5:19 PM
Subject: Re: Web page source using wget?


 Suhas Tembe [EMAIL PROTECTED] writes:
 
  Hello Everyone,
 
  I am new to this wget utility, so pardon my ignorance.. Here is a
  brief explanation of what I am currently doing:
 
  1). I go to our customer's website every day  log in using a User Name  Password.
  2). I click on 3 links before I get to the page I want.
  3). I right-click on the page  choose view source. It opens it up in Notepad.
  4). I save the source to a file  subsequently perform various tasks on that 
  file.
 
  As you can see, it is a manual process. What I would like to do is
  automate this process of obtaining the source of a page using
  wget. Is this possible? Maybe you can give me some suggestions.
 
 It's possible, in fact it's what Wget does in its most basic form.
 Disregarding authentication, the recipe would be:
 
 1) Write down the URL.
 
 2) Type `wget URL' and you get the source of the page in file named
SOMETHING.html, where SOMETHING is the file name that the URL ends
with.
 
 Of course, you will also have to specify the credentials to the page,
 and Tony explained how to do that.
 



Re: Web page source using wget?

2003-10-07 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 Thanks everyone for the replies so far..

 The problem I am having is that the customer is using ASP  Java
 script. The URL stays the same as I click through the links.

URL staying the same is usually a sign of the use of frame, not of ASP
and JavaScript.  Instead of looking at the URL entry field, try using
copy link to clipboard instead of clicking on the last link.  Then
use Wget on that.



Re: Web page source using wget?

2003-10-07 Thread Suhas Tembe
Got it! Thanks! So far so good. After logging-in, I was able to get to the page I am 
interested in. There was one thing that I forgot to mention in my earlier posts (I 
apologize)... this page contains a drop-down list of our customer's locations. At 
present, I choose one location from the drop-down list  click submit to get the 
data, which is displayed in a report format. I right-click  then choose view 
source  save source to a file. I then choose the next location from the 
drop-down list, click submit again. I again do a view source  save the source to 
another file and so on for all their locations.

I am not quite sure how to automate this process! How can I do this non-interactively? 
especially the submit portion of the page. Is this possible using wget?

Thanks,
Suhas

- Original Message - 
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, October 07, 2003 5:02 PM
Subject: Re: Web page source using wget?


 Suhas Tembe [EMAIL PROTECTED] writes:
 
  Thanks everyone for the replies so far..
 
  The problem I am having is that the customer is using ASP  Java
  script. The URL stays the same as I click through the links.
 
 URL staying the same is usually a sign of the use of frame, not of ASP
 and JavaScript.  Instead of looking at the URL entry field, try using
 copy link to clipboard instead of clicking on the last link.  Then
 use Wget on that.
 



Re: Web page source using wget?

2003-10-07 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 this page contains a drop-down list of our customer's locations.
 At present, I choose one location from the drop-down list  click
 submit to get the data, which is displayed in a report format. I
 right-click  then choose view source  save source to a file.
 I then choose the next location from the drop-down list, click
 submit again. I again do a view source  save the source to
 another file and so on for all their locations.

It's possible to automate this, but it requires some knowledge of
HTML.  Basically, you need to look at the form.../form part of the
page and find the select tag that defines the drop-down.  Assuming
that the form looks like this:

form action=http://foo.com/customer; method=GET
  select name=location
option value=caCalifornia
option value=maMassachussetts
...
  /select
/form

you'd automate getting the locations by doing something like:

for loc in ca ma ...
do
  wget http://foo.com/customer?location=$loc;
done

Wget will save the respective sources in files named
customer?location=ca, customer?location=ma, etc.

But this was only an example.  The actual process depends on what's in
the form, and it might be considerably more complex than this.



Re: Web page source using wget?

2003-10-07 Thread Suhas Tembe
It does look a little complicated This is how it looks:

form action=InventoryStatus.asp method=post name=select onsubmit=return 
select_validate(); style=margin:0
div style=margin-top:10px
table border=1 bordercolor=#d9d9d9 bordercolordark=#ff 
bordercolorlight=#d9d9d9 cellpadding=3 cellspacing=0 width=100%
tr
td style=font-weight:bold;color:black;background-color:#CC;text-align:right 
width=20%nobrSuppliernbsp;/nobr/td
td style=color:black;background-color:#F0;text-align:left 
colspan=2nobrselect name=cboSupplieroption value=4541-134289454A/option
option value=4542-134289 selected454B/option/select img id=cboSupplier_icon 
name=cboSupplier_icon src=../images/required.gif alt=*/nobr/td
/tr
tr
td style=font-weight:bold;color:black;background-color:#CC;text-align:right 
width=20%nobrQuantity Statusnbsp;/nobr/td
td style=color:black;background-color:#F0;text-align:left colspan=2
table border=0 cellpadding=0 cellspacing=0
tr
td
table border=0
tr
td width=1input id=choice_IDAMCB3B name=status type=radio value=over/td
td style=color:black;background-color:#F0;text-align:leftspan 
onclick=choice_IDAMCB3B.checked=true; Over/span/td
td width=1input id=choice_IDARCB3B name=status type=radio 
value=under/td
td style=color:black;background-color:#F0;text-align:leftspan 
onclick=choice_IDARCB3B.checked=true; Under/span/td
td width=1input id=choice_IDAWCB3B name=status type=radio value=both/td
td style=color:black;background-color:#F0;text-align:leftspan 
onclick=choice_IDAWCB3B.checked=true; Both/span/td
td width=1input id=choice_IDA1CB3B name=status type=radio value=all 
checked/td
td style=color:black;background-color:#F0;text-align:leftspan 
onclick=choice_IDA1CB3B.checked=true; All/span/td
/tr
/table
/td
td img id=status_icon name=status_icon src=../images/blank.gif alt=/td
/tr
/table
/td
/tr
tr
td style=font-weight:bold;color:black;background-color:#CCnbsp;/td
td colspan=2 
style=font-weight:bold;color:black;background-color:#CC;text-align:leftinput 
type=submit name=action-select value=Query onclick=doValidate = true; /td
/tr
/table
/div
/form


I don't see any specific URL that would get the relevant data after I hit submit. 
Maybe I am missing something...

Thanks,
Suhas


- Original Message - 
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Suhas Tembe [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, October 07, 2003 5:24 PM
Subject: Re: Web page source using wget?


 Suhas Tembe [EMAIL PROTECTED] writes:
 
  this page contains a drop-down list of our customer's locations.
  At present, I choose one location from the drop-down list  click
  submit to get the data, which is displayed in a report format. I
  right-click  then choose view source  save source to a file.
  I then choose the next location from the drop-down list, click
  submit again. I again do a view source  save the source to
  another file and so on for all their locations.
 
 It's possible to automate this, but it requires some knowledge of
 HTML.  Basically, you need to look at the form.../form part of the
 page and find the select tag that defines the drop-down.  Assuming
 that the form looks like this:
 
 form action=http://foo.com/customer; method=GET
   select name=location
 option value=caCalifornia
 option value=maMassachussetts
 ...
   /select
 /form
 
 you'd automate getting the locations by doing something like:
 
 for loc in ca ma ...
 do
   wget http://foo.com/customer?location=$loc;
 done
 
 Wget will save the respective sources in files named
 customer?location=ca, customer?location=ma, etc.
 
 But this was only an example.  The actual process depends on what's in
 the form, and it might be considerably more complex than this.
 



Re: Web page source using wget?

2003-10-07 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 It does look a little complicated This is how it looks:

 form action=InventoryStatus.asp method=post [...]
[...]
 select name=cboSupplier
 option value=4541-134289454A/option
 option value=4542-134289 selected454B/option
 /select

Those are the important parts.  It's not hard to submit this form.
With Wget 1.9, you can even use the POST method, e.g.:

wget http://.../InventoryStatus.asp --post-data \
 'cboSupplier=4541-134289status=allaction-select=Query' \
 -O InventoryStatus1.asp
wget http://.../InventoryStatus.asp --post-data \
 'cboSupplier=4542-134289status=allaction-select=Query'
 -O InventoryStatus2.asp

It might even work to simply use GET, and retrieve
http://.../InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query
without the need for `--post-data' or `-O', but that depends on the
ASP script that does the processing.

The harder part is to automate this process for *any* values in the
drop-down list.  You might need to use an intermediary Perl script
that extracts all the option value=... from the HTML source of the
page with the drop-down.  Then, from the output of the Perl script,
you call Wget as shown above.

It's doable, but it takes some work.  Unfortunately, I don't know of a
(command-line) tool that would make this easier.



Re: Web page source using wget?

2003-10-06 Thread Tony Lewis
Suhas Tembe wrote:

 1). I go to our customer's website every day  log in using a User Name 
 Password.
[snip]
 4). I save the source to a file  subsequently perform various tasks on
 that file.

 What I would like to do is automate this process of obtaining the source
 of a page using wget. Is this possible?

That depends on how you enter your user name and password. If it's via using
an HTTP user ID and password, that's pretty easy.

wget
http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS

If you supply your user ID and password via a web form, it will be tricky
(if not impossible) because wget doesn't POST forms (unless someone added
that option while I wasn't looking. :-)

Tony



Re: Web page source using wget?

2003-10-06 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 wget
 http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS

 If you supply your user ID and password via a web form, it will be
 tricky (if not impossible) because wget doesn't POST forms (unless
 someone added that option while I wasn't looking. :-)

Wget 1.9 can send POST data.

But there's a simpler way to handle web sites that use cookies for
authorization: make Wget use the site's own cookie.  Export cookies as
explained in the manual, and specify:

wget --load-cookies=COOKIE-FILE http://...

Here is an excerpt from the manual section that explains how to export
cookies.

`--load-cookies FILE'
 Load cookies from FILE before the first HTTP retrieval.  FILE is a
 textual file in the format originally used by Netscape's
 `cookies.txt' file.

 You will typically use this option when mirroring sites that
 require that you be logged in to access some or all of their
 content.  The login process typically works by the web server
 issuing an HTTP cookie upon receiving and verifying your
 credentials.  The cookie is then resent by the browser when
 accessing that part of the site, and so proves your identity.

 Mirroring such a site requires Wget to send the same cookies your
 browser sends when communicating with the site.  This is achieved
 by `--load-cookies'--simply point Wget to the location of the
 `cookies.txt' file, and it will send the same cookies your browser
 would send in the same situation.  Different browsers keep textual
 cookie files in different locations:

Netscape 4.x.
  The cookies are in `~/.netscape/cookies.txt'.

Mozilla and Netscape 6.x.
  Mozilla's cookie file is also named `cookies.txt', located
  somewhere under `~/.mozilla', in the directory of your
  profile.  The full path usually ends up looking somewhat like
  `~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'.

Internet Explorer.
  You can produce a cookie file Wget can use by using the File
  menu, Import and Export, Export Cookies.  This has been
  tested with Internet Explorer 5; it is not guaranteed to work
  with earlier versions.

Other browsers.
  If you are using a different browser to create your cookies,
  `--load-cookies' will only work if you can locate or produce a
  cookie file in the Netscape format that Wget expects.

 If you cannot use `--load-cookies', there might still be an
 alternative.  If your browser supports a cookie manager, you can
 use it to view the cookies used when accessing the site you're
 mirroring.  Write down the name and value of the cookie, and
 manually instruct Wget to send those cookies, bypassing the
 official cookie support:

  wget --cookies=off --header Cookie: NAME=VALUE




Re: Web page source using wget?

2003-10-06 Thread Hrvoje Niksic
Suhas Tembe [EMAIL PROTECTED] writes:

 Hello Everyone,

 I am new to this wget utility, so pardon my ignorance.. Here is a
 brief explanation of what I am currently doing:

 1). I go to our customer's website every day  log in using a User Name  Password.
 2). I click on 3 links before I get to the page I want.
 3). I right-click on the page  choose view source. It opens it up in Notepad.
 4). I save the source to a file  subsequently perform various tasks on that file.

 As you can see, it is a manual process. What I would like to do is
 automate this process of obtaining the source of a page using
 wget. Is this possible? Maybe you can give me some suggestions.

It's possible, in fact it's what Wget does in its most basic form.
Disregarding authentication, the recipe would be:

1) Write down the URL.

2) Type `wget URL' and you get the source of the page in file named
   SOMETHING.html, where SOMETHING is the file name that the URL ends
   with.

Of course, you will also have to specify the credentials to the page,
and Tony explained how to do that.