Re: Wget not resending cookies on "Location:" in headers

2005-04-25 Thread wget
Is there a publically accessible site that exhibits this problem?
I've set up a small example which illustrates the problem. Files can be 
found at http://dev.mesca.net/wget/ (using demo:test as login).

Three files:
setcookie.php:
--

getcookie.php:
--

get-cookie-redirect.php:


We first set the cookie by wgetting setcookie.php.
Then, we're trying to read the cookie by querying getcookie.php, which 
redirects to get-cookie-redirect.php: wget can't read it.

$ wget --http-user=demo --http-passwd=test --cookies=on 
--save-cookies=cookie.txt http://dev.mesca.net/wget/setcookie.php
$ wget --http-user=demo --http-passwd=test --cookies=on 
--load-cookies=cookie.txt http://dev.mesca.net/wget/getcookie.php

Note: tests were made using the latest version from the CVS 
(1.10-alpha2+cvs-dev).

Le 26 avr. 05, à 00:09, Hrvoje Niksic a écrit :
- The server responds with a "Location: http://host.com/member.php"; in
headers. Here is the point : member.php requires cookies defined by
index.php and checkuser.php. However these cookies are not resended by
Wget.
That sounds like a bug.  Wget is supposed to resend the cookies.
Could you provide any kind of debug information?  The contents of the
cookies is not important, but the "path" parameter and the expiry date
is.
According to my tests, the problem is still reproducible whatever 
"Path" and "Expiry date" contain.

Regards,
Pierre


Re: Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Andrzej
> I agree with your criticism, if not with your tone.  We are working on
> improving Wget, and I believe that the problems you have seen will be
> fixed in the versions to come.  (I plan to look into some of them for
> the 1.11 release.)

OK. Thanks. Good to hear that. Looking forward impatiently for the new 
version. :)

> That problem has been corrected, and it can be worked around by not
> using -P.

Yes, indeed. Thanks.

In order to download all that website:
http://znik.wbc.lublin.pl/ChemFan/
which, unfortunately, partly is also under this address: 
http://lists.man.lodz.pl/pipermail/chemfan/
I had to manually modify content of that web page:
http://znik.wbc.lublin.pl/ChemFan/Archiwum/index.html
(which contains the above link) and use the "tricks" to make a mirror of 
it all:



cd $HOME/web/chemfan.pl

wget -m -nv -k -K -E -nH --cut-dirs=1 -np -t 1000 -D wbc.lublin.pl -o 
$HOME/logiwget/logchemfan.pl -p http://znik.wbc.lublin.pl/ChemFan/ && \

cd $HOME/web/chemfan.pl/arch && \

wget -m -nv -k -K -E -nH -np --cut-dirs=2 -t 1000 -D lists.man.lodz.pl --
follow-ftp -o $HOME/logiwget/logchemfanarchive.pl -p 
http://lists.man.lodz.pl/pipermail/chemfan/ && \

cp $HOME/web/domirrora/Archiwum/index.html 
$HOME/web/chemfan.pl/Archiwum/index.html

==

If you know how to make it simpler let me know.

Do you think that && is really necessary here?

Of course for other sites other "recipes" might need to be developed in 
order to mirror them correctly, so unfortunately it is not universal at 
all.

And unfortunately not everything went fine, yet, when using the above 
"script".

On the page 
http://lists.man.lodz.pl/pipermail/chemfan/ 
is a link:
ftp://ftp.man.lodz.pl/pub/doc/LISTY-DYSKUSYJNE/CHEMFAN
and it seems that also to mirror this I'll have to run yet another wget 
session, and then manually modify and copy the page: 
http://chemfan.pl.feedle.com/arch/index.html

a.


Not detected hyperlink in recursive downloading (wget 1.9.1)

2005-04-25 Thread nemeth
Dear developers,

I think, I have found a bug in wget.

I tried to download the European Constitution in English from

http://europa.eu.int/eur-lex/lex/en/treaties/dat/12004V/htm/12004V.html

with the following wget command:

wget -r -l 2
http://europa.eu.int/eur-lex/lex/en/treaties/dat/12004V/htm/12004V.html

In this file the "20. Protocol on the position of Denmark" link is not
detected,
and the file on
http://europa.eu.int/eur-lex/lex/en/treaties/dat/12004V/htm/C2004310EN.01035601.htm
hadn't downloaded. This link works in Firefox and Links browsers.
Surprisingly, this relative link is not different from the other links in
12004V.html.

I repeated this downloading with similar result.

I attached the 12004V.html file.

Wget version is 1.9.1.

Best regards,

László Németh

PS: Many thanks for wget! I hope, this bug report will help to make more perfect
 your nice program. 





This message was sent using IMP, the Internet Messaging Program.
Title: Celex Test








Official Journal C 310 , 16 December 2004



ENTreaty establishing a Constitution for EuropePREAMBLEPART IPART II — THE CHARTER OF FUNDAMENTAL RIGHTS OF THE UNIONPART III — THE POLICIES AND FUNCTIONING OF THE UNIONPART IV — GENERAL AND FINAL PROVISIONSPROTOCOLS AND ANNEXESA. Protocols annexed to the Treaty establishing a Constitution for Europe1. Protocol on the role of national Parliaments in the European Union2. Protocol on the application of the principles of subsidiarity and proportionality3. Protocol on the Statute of the Court of Justice of the European Union4. Protocol on the Statute of the European System of Central Banks and of the European Central Bank5. Protocol on the Statute of the European Investment Bank6. Protocol on the location of the seats of the institutions and of certain bodies, offices, agencies and departments of the European Union7. Protocol on the privileges and immunities of the European Union8. Protocol on the Treaties and Acts of Accession of the Kingdom of Denmark, Ireland and the United Kingdom of Great Britain and Northern Ireland, of the Hellenic Republic, of the Kingdom of Spain and the Portuguese Republic, and of the Republic of Austria, the Republic of Finland and the Kingdom of Sweden9. Protocol on the Treaty and the Act of Accession of the Czech Republic, the Republic of Estonia, the Republic of Cyprus, the Republic of Latvia, the Republic of Lithuania, the Republic of Hungary, the Republic of Malta, the Republic of Poland, the Republic of Slovenia and the Slovak Republic10. Protocol on the excessive deficit procedure11. Protocol on the convergence criteria12. Protocol on the Euro Group13. Protocol on certain provisions relating to the United Kingdom of Great Britain and Northern Ireland as regards economic and monetary union14. Protocol on certain provisions relating to Denmark as regards economic and monetary union15. Protocol on certain tasks of the National Bank of Denmark16. Protocol on the Pacific Financial Community franc system17. Protocol on the Schengen acquis integrated into the framework of the European Union18. Protocol on the application of certain aspects of Article III-130 of the Constitution to the United Kingdom and to Ireland19. Protocol on the position of the United Kingdom and Ireland on policies in respect of border controls, asylum and immigration, judicial cooperation in civil matters and on police cooperation20. Protocol on the position of Denmark21. Protocol on external relations of the Member States with regard to the crossing of external borders22. Protocol on asylum for nationals of Member States23. Protocol on permanent structured cooperation established by Article I-41(6) and Article III-312 of the Constitution24. Protocol on Article I-41(2) of the Constitution25. Protocol concerning imports into the European Union of petroleum products refined in the Netherlands Antilles26. Protocol on the acquisition of property in Denmark27. Protocol on the system of public broadcasting in the Member States28. Protocol concerning Article III-214 of the Constitution29. Protocol on economic, social and territorial cohesion30. Protocol on special arrangements for Greenland31. Protocol on Article 40.3.3 of the Constitution of Ireland32. Protocol relating to Article I-9(2) of the Constitution on the accession of the Union to the European Convention on the Protection of Human Rights and Fundamental Freedoms33. Protocol on the Acts and Treaties which have supplemented or amended the Treaty establishing the European Community and the Treaty on European Union34. Protocol on the transitional provisions relating to the institutions and bodies of the Union35. Protocol on the financial consequences of the expiry of the Treaty establishing the European Coal and Steel Community and on the Research Fund for Coal and Steel36. Protocol amending the Treaty establishing the European Atomic Energy CommunityB. Annexes to the Treaty est

Re: Wget not resending cookies on "Location:" in headers

2005-04-25 Thread Hrvoje Niksic
[EMAIL PROTECTED] writes:

> - The server responds with a "Location: http://host.com/member.php"; in
> headers. Here is the point : member.php requires cookies defined by
> index.php and checkuser.php. However these cookies are not resended by
> Wget.

That sounds like a bug.  Wget is supposed to resend the cookies.

Could you provide any kind of debug information?  The contents of the
cookies is not important, but the "path" parameter and the expiry date
is.

Is there a publically accessible site that exhibits this problem?


Re: SSL option documentation

2005-04-25 Thread Mauro Tortonesi
On Saturday 23 April 2005 06:52 pm, you wrote:
> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
>
> > i would change:
> >
> > --sslcerttype=0/1 to --sslcerttype=PEM/ASN1
> > --sslcheckcert=1/0 to --no-sslcheckcert/--sslcheckcert
> > --sslprotocol=0-3 to --no-ssl/--ssl=SSLv2/SSLv3/TLSv1
>
> The name could (and IMHO should) be made even more readable,
> e.g. --ssl-cert-type or even --ssl-certificate-type.  It might make
> sense to drop the "ssl" prefix altogether because those options also
> apply to TLS.  The option would then be --certificate-type, which is
> shorter and nicer.  I believe curl has done that.
>
> Since --sslprotocol can specify TLS protocol, it might be more
> accurate to name it --secure-protocol (--protocol is too general),
> with the accepted values "auto" (default), "sslv2", "sslv3", and
> "tlsv1", all case-insensitive.  (Note that the current --sslprotocol=0
> does *not* correspond to --no-ssl; it means choose automatically.  The
> fact that it confused you is further proof of the brokenness of
> current option names!)

--certificate-type and --secure-protocol seem fine for me.

> > the other options seem fine to me, although i prefer names like
> > --ssl_cert_file than --sslcertfile.
>
> Sure, except it should be --ssl-cert-file; Wget (and GNU software in
> general) doesn't use underscores in option names.

right. 

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human & Machine Cognition   http://www.ihmc.us
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Wget not resending cookies on "Location:" in headers

2005-04-25 Thread wget
Hello,
I use Wget version 1.10-alpha2+cvs-dev (because of the avalability of 
the --keep-session-cookies option).

I'm trying to wget a member page, where cookies are required for access.
The usual login procedure is:
-
- Get a session cookie (PHPSESSID) on http://host.com/index.php
- Get http://host.com/checkuser.php which defines additional cookies. 
checkuser.php requires PHPSESSID, username, and password in POST 
method.
- The server responds with a "Location: http://host.com/member.php"; in 
headers. Here is the point : member.php requires cookies defined by 
index.php and checkuser.php. However these cookies are not resended by 
Wget. Thus I can't have access to member.php (for some reason, the 
download of member.php ends with a timeout when cookies aren't set).

Here is how I proceed:
--
- Get the PHPSESSID
$ wget --cookies=on --keep-session-cookies --save-cookies=cookie.txt 
http://host.com/index.php
(I'm then getting the value of PHPSESSID in a variable with a cut -s -f 
7 cookie.txt)
- Authenticate on http://host.com/checkuser.php
$ wget --referer='http://host.com/index.php' --cookies=on 
--load-cookies=cookie.txt --keep-session-cookies 
--save-cookies=cookie.txt 
--post-data='PHPSESSID=$phpsessid&username=$usr&password=$pwd' 
http://host.com/checkuser.php

Wget downloads and set the new cookies properly, on checkuser.php, then 
redirects on member.php but keep retrying and eventually ends with a 
timeout.

Considerations
--
I can see two ways to avoid this issue:
- Tell Wget not to follow links in "Location:" field in headers. I 
could then resend the cookies to member.php;
- Tell Wget to resend cookies when following links in headers.

I didn't find anything in the documentation about these work-arounds. 
How to resolve this problem?

Best regards,
Pierre


Re: How to do a file upload

2005-04-25 Thread Jim Cox
--- John <[EMAIL PROTECTED]> wrote:
> > 
> > Hello,
> > 
> > I have a relatively simple process of uploading a
> file that I am trying
> > to
> > perform with wget but I am having problems with
> the syntax.  Can someone
> > please help?  Thank you in advance.
> > 
> > I am trying to replicate the process of uploading
> a file that is common
> > to
> > a lot of webpages.  The webpage has a INPUT
> TYPE="file" dialog and also
> > a
> > button to sumbit the document.
> > 
> > I am just trying to upload a very simple csv file.
>  It only has 2
> > columns
> > and 2 rows and is shown below:
> > 
> > Order,Number
> > 2245,3
> > 2246,7
> > 
> > The page I am trying to interface with has a
> button that says "import
> > order items info" and a field next to it where you
> enter the file path. 
> > Here is the HTML of the page that I am trying to
> interface with:
> > 
> >  > action="URL_GOES_HERE/po_management.php"
> method="POST">
> >  
> > 
> > 
> > Order Import CSV:
> >  value="300" />
> > 
> > 
> > 
> >  > NAME="import_items"
> > class="button" />
> > 
> > 
> > This is what I think my wget command should be:
> > wget --post-file=upload.txt
> --output-document=output.txt
> > URL_GOES_HERE/po_management.php
> > 
> > Where upload.txt is:
> > &import_items=import order items
> info&import_file=Order, Number
> > 2245,3
> > 2246,7
> > 
> > If someone could please help me with what I am
> doing wrong and what I
> > should change I would appreciate it.  Thank you.

>From the "enctype" of the INPUT tag you supplied it
looks like you have to POST with "multipart/form-data"
as in RFC 1867
(http://www.faqs.org/rfcs/rfc1867.html). Look in
section 6 of that doc for examples.

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


How to do a file upload

2005-04-25 Thread John
> 
> Hello,
> 
> I have a relatively simple process of uploading a file that I am trying
> to
> perform with wget but I am having problems with the syntax.  Can someone
> please help?  Thank you in advance.
> 
> I am trying to replicate the process of uploading a file that is common
> to
> a lot of webpages.  The webpage has a INPUT TYPE="file" dialog and also
> a
> button to sumbit the document.
> 
> I am just trying to upload a very simple csv file.  It only has 2
> columns
> and 2 rows and is shown below:
> 
> Order,Number
> 2245,3
> 2246,7
> 
> The page I am trying to interface with has a button that says "import
> order items info" and a field next to it where you enter the file path. 
> Here is the HTML of the page that I am trying to interface with:
> 
>  action="URL_GOES_HERE/po_management.php" method="POST">
>  
> 
>   
>   Order Import CSV:
>   
>   
>   
>   
>NAME="import_items"
> class="button" />
>   
> 
> This is what I think my wget command should be:
> wget --post-file=upload.txt --output-document=output.txt
> URL_GOES_HERE/po_management.php
> 
> Where upload.txt is:
> &import_items=import order items info&import_file=Order, Number
> 2245,3
> 2246,7
> 
> If someone could please help me with what I am doing wrong and what I
> should change I would appreciate it.  Thank you.
 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Hrvoje Niksic
"Andrzej " <[EMAIL PROTECTED]> writes:

> Multitude options in Wget is just an ilusion. In real life Wget
> cannot cope with sites mirroring.

I agree with your criticism, if not with your tone.  We are working on
improving Wget, and I believe that the problems you have seen will be
fixed in the versions to come.  (I plan to look into some of them for
the 1.11 release.)

> And even if the site would not have the above problems then still
> the problem with proper convertion of the links exist.

That problem has been corrected, and it can be worked around by not
using -P.


Re: Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Hrvoje Niksic
"Andrzej " <[EMAIL PROTECTED]> writes:

>> > Thus it seems that it should not matter what is the sequence of the 
>> > options. If it does I suggest that the developers of wget place 
>> > appriopriate info in the manual.
>> 
>> Yes, you right. Anyway I found out often that it's sometimes quite tricky
>> setting up your command line to get exactly what you want.
>> The way I do it always works fine for me.
>
> Could developers confirm whether sequence of options matters or not?

The order of options does not matter.


Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Andrzej
> > Thus it seems that it should not matter what is the sequence of the 
> > options. If it does I suggest that the developers of wget place 
> > appriopriate info in the manual.
> 
> Yes, you right. Anyway I found out often that it's sometimes quite tricky
> setting up your command line to get exactly what you want.
> The way I do it always works fine for me.

Could developers confirm whether sequence of options matters or not?

> > The log shows, that you haven't downloaded all the graphics from the main 
> > page, and also you haven't downloaded that link:
> > http://lists.feedle.net/pipermail/minerals/
> 
> Well, I didn't verify it with the homepage itself. I initially tried without
> -e --robots=off and got a message blocking further downloading.
> 
> With this option I could achieve further access for downloading.
> I have only tried the one link from above.

I doubt. I tried it without the option and I did not have all the 
graphics. -p option doesn't work as it should.

> > I could try to use the -D option, but then probably everything would be 
> > downloaded from the lists.feedle.net despite the -np option used, 
> > wouldn't it? 
> 
> I don't know exactly how these two options interact with each other.
> Ever tried the -m option?

Of course I tried, haven't you noticed in my previous posts?

> Very often when mirroring I use this line:
> 
> wget -P work:1/ -r -l 2 -H -nc p "http://www.xxx.xx";

This is not really proper mirroring, merely downloading.

> This would have the side effect downloading other links recursively and from
> other hosts if there are any.

You see...

> But of course you can define a list of allowed dirs and excluded dirs.
> I never tried this though.

What's the point of mirroring if I would have to define every time 
allowed and excluded directories? 
I want to run mirror automaticly, periodicaly from cron, and therefore 
the options should be as general as possible, so that no matter what 
changes are done on the site I would still have the site properly 
mirrored without amending the options all the time.
But of course some definitions of directories and sites might be 
necessary from time to time, but as I shown in my corespondence here it 
is not possible to define everything that way that mirroring would work 
properly for all the web elements and the web pages on a particular site.

> After all you maybe shouldn't forget the -k option so you can browse these
> sites offline.

I use it.

My conclusion is (and I am really sorry to say that, cause I liked wget 
until now): 
Wget sucks (for mirroring at least)!

It is useful only for very simple tasks, but when one wants to use it for 
sites mirroring it is almost useless, it cannot be done fully properly 
with Wget, as it can be seen in my previous e-mails.

Summary:
1. -p option doesn't do what is should be doing. It doesn't download all 
graphics no matter what is source of the graphics.
2. -P option used with converting links options doesn't allow the links 
to be properly converted (at least in the current stable wget)
2. -D and -I options do not include paths (directories) in URLs. 
3. -np option should IMHO react to the paths after -D and -I options
4. Just everything should be done to enable proper mirroring of the web 
sites.

Multitude options in Wget is just an ilusion. In real life Wget cannot 
cope with sites mirroring. It is not possible in Wget to set options that 
way that sites with some foreign elements (graphics) or web pages 
scattered over several servers (links to different domains) are mirrored 
correctly. And even if the site would not have the above problems then 
still the problem with proper convertion of the links exist.

Does anyone know any software for linux/unix shell, which would cope to 
the task of proper mirroring?

a.


Re: links not properly converted

2005-04-25 Thread Hrvoje Niksic
"Andrzej " <[EMAIL PROTECTED]> writes:

> I mirrored the chemfan site using those options:
>
> wget -m -nv -k -K -E -nH --cut-dirs=1 -np -t 1000 -D wbc.lublin.pl -o 
> $HOME/logiwget/logchemfan.pl -P $HOME/web/chemfan.pl -p 
> http://znik.wbc.lublin.pl/ChemFan/
>
> and unfortunately the links are not converted properly in the mirror:
> http://chemfan.pl.feedle.com/
> Try clicking on any of them.
> Instead of 
> http://chemfan.pl.feedle.com/Powitanie/index.html
> I have:
> http://chemfan.pl.feedle.com/home/andyk/web/chemfan.pl/Powitanie/index.htm
>
> How to correct options to have it correctly converted?

It is a consequence of using -P in older versions of Wget.  That bug
has been fixed in Wget 1.10 (currently in alpha).  You can work around
it by not using -P, but cd'ing to the desired directory instead.