Re: wget url with hash # issue

2007-09-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Aram Wool wrote:
 Hi, I'm having trouble retrieving an mp3 file from a url of the form
 
 http://www.websitename.com/HTML/typo3conf/ext/naksci_synd/mod1/index.php?mode=LATESTpid=13recursive=255feeduid=1feed=Normaluser=8hash=d84a36bbaa1906cc07007557c6b60395
 
 entering this url in a browser opens the 'save as' dialogue box for the
 mp3, but the file isn't found if wget is used instead.

Well, since the above URL doesn't point to any real resource, we can't
really track down what problems you may be having.

Also, the URL doesn't seem to have anything to do with the subject of
your message, which mentions a hash # (unless you mean hash number,
the last parameter in the query string; that's ambiguous, because the
# itself is often called a hash mark).

Since you haven't given us enough information to help you, I can only
hazard a wide guess, and wonder if the site might be explicitly blocking
wget, in which case you can use the --user-agent option to trick it (try
a value like 'Mozilla', or emulate whatever your browser sends).

 Also, is it possible to add an asterik to a url so as to indicate that
 wget should ignore the characters before or after it?

I really don't understand what you're asking for here. If you want Wget
to ignore the characters you've specified, why specify them in the first
place?

If you mean that you want Wget to find any file that matches that
wildcard, well no: Wget can do that for FTP, which supports directory
listings; it can't do that for HTTP, which has no means for listing
files in a directory (unless it has been extended, for example with
WebDAV, to do so).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG362l7M8hyUobTrERCJ+RAJ9BWXs6d8VAZyOf5ozaozokUEptRACeOR0J
ET5Ur9UdFWTKzQtYjPM6Pg4=
=Y4xe
-END PGP SIGNATURE-


RE: wget url with hash # issue

2007-09-06 Thread Tony Lewis
Micah Cowan wrote:

 If you mean that you want Wget to find any file that matches that
 wildcard, well no: Wget can do that for FTP, which supports directory
 listings; it can't do that for HTTP, which has no means for listing
 files in a directory (unless it has been extended, for example with
 WebDAV, to do so).

Seems to me that is a big unless because we've all seen lots of websites
that have http directory listings. Apache will do it out of the box (and by
default) if there is no index.htm[l] file in the directory.

Perhaps we could have a feature to grab all or some of the files in a HTTP
directory listing. Maybe something like this could be made to work:

wget http://www.exelana.com/images/mc*.gif

Perhaps we would need an option such as --http-directory (the first thing
that came to mind, but not necessarily the most intuitive name for the
option) to explicitly tell wget how it is expected to behave. Or perhaps it
can just try stripping the filename when doing an http request and wildcards
are specified.

At any rate (with or without the command line option), wget would retrieve
http://www.exelana.com/images/ and then retrieve any links where the target
matches mc*.gif.

If wget is going to explicitly support http directory listings, it probably
needs to be intelligent enough to ignore the sorting options. In the case of
Apache, that would be things like A HREF=?N=DName/A.

Anyone have any idea how many different http directory listing formats are
out there?

Tony



Re: Myriad merges

2007-09-06 Thread Jochen Roderburg
Zitat von Jochen Roderburg [EMAIL PROTECTED]:

 So it looks now to me, that the new error (local timestamp not set to remote)
 only occurs in the cases when no HEAD is used.

This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to me,
especially the time_came_from_head bit:

  /* Reparse time header, in case it's changed. */
  if (time_came_from_head
   hstat.remote_time  hstat.remote_time[0])
{
  newtmr = http_atotm (hstat.remote_time);
  if (newtmr != -1)
tmr = newtmr;
}

Other than that I have used the current svn version now a few days more with all
my work and I would say all the issues that had bothered me in the recent
development cycles are corrected now.
I'll see, however, that I can make a few more systematic tests with some
combination of the relevant options which I usually do not use in my practice.

What I have seen new are some cosmetic issues in the program output when HTTP
restarts happen. Such restarts are normally rare these days, but I have some
sites far away where suddenly bad connections and timeouts reappeared. One
looks pretty simple, I think I can prepare a patch myself on the weekend when I
have access to my Linux development system at home again. I'll report details
in separate mail later, when I have examples for the cases.

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: Myriad merges

2007-09-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jochen Roderburg wrote:
 Zitat von Jochen Roderburg [EMAIL PROTECTED]:
 
 So it looks now to me, that the new error (local timestamp not set to remote)
 only occurs in the cases when no HEAD is used.
 
 This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to 
 me,
 especially the time_came_from_head bit:
 
   /* Reparse time header, in case it's changed. */
   if (time_came_from_head
hstat.remote_time  hstat.remote_time[0])
 {
   newtmr = http_atotm (hstat.remote_time);
   if (newtmr != -1)
 tmr = newtmr;
 }

The intent behind this code is to ensure that we parse the Last-Modified
date again, even if we already parsed Last-Modified, if the last one we
parsed came from the HEAD. This whole block of code that you've pasted
is new, not just the surrounding if clause; if we never sent a HEAD but
only a GET, the Last-Modified _should_ have been parsed in code that
appears before here.

...but, obviously, things aren't working quite as they should, so I need
to look into it more closely.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG4DD77M8hyUobTrERCFf0AJ9MVT0+eTCidH63YTBuHKrXTmA+3QCeIzav
x1bSxRx1I3I1eXnvz8Pv384=
=EfI4
-END PGP SIGNATURE-


Re: wget syntax problem ?

2007-09-06 Thread Josh Williams
On 9/6/07, Alan Thomas [EMAIL PROTECTED] wrote:


I know this is probably something simple I screwed up, but the following
 commands in a Windows batch file return the error Bad command or file name
 for the wget command

 cd ..
 wget --convert-links
 --directory-prefix=C:\WINDOWS\Profiles\Alan000\Desktop\wget\CNN\
 --no-clobber http://www.cnn.com;

Don't use backslashes in filenames. If you do, use `\\` instead.


Re: wget syntax problem ?

2007-09-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
 command.com
 
 By the way, Josh and your messages are being put out to the list in
 dupicates (at least, that`s what I`m seeing on my end).

Not really; we've been Cc'ing you. I don't think we knew whether you
were subscribed or not, and so Cc'd you in case you weren't. Also, many
of us just habitually hit Reply All to hit the message, so we don't
accidentally send it to the message's author only. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG4Kys7M8hyUobTrERCCK4AJ9rOGMPa1Xcl/evqENs6pmN7AAncACfeWhd
nyC+OzJ3ME7vMqRsEoVNP68=
=n6JC
-END PGP SIGNATURE-


Re: wget syntax problem ?

2007-09-06 Thread Alan Thomas
command.com

By the way, Josh and your messages are being put out to the list in
dupicates (at least, that`s what I`m seeing on my end).

- Original Message - 
From: Micah Cowan [EMAIL PROTECTED]
To: Alan Thomas [EMAIL PROTECTED]
Cc: wget@sunsite.dk
Sent: Thursday, September 06, 2007 9:34 PM
Subject: Re: wget syntax problem ?


 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Alan Thomas wrote:
  Please ignore.  It was needing the \\, like Josh said.

 Out of curiosity, what command interpreter were you using? Was this
 command.com, or something else like rxvt/Cygwin?

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD4DBQFG4Kqe7M8hyUobTrERCI3HAJjw+g0GsGE1b+6vhr+pu/QJAQIuAJ4o2UbP
 e3qqbx+ywsdRpTuIbx6VPQ==
 =792z
 -END PGP SIGNATURE-



Re: wget syntax problem ?

2007-09-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
 Please ignore.  It was needing the \\, like Josh said.

Out of curiosity, what command interpreter were you using? Was this
command.com, or something else like rxvt/Cygwin?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFG4Kqe7M8hyUobTrERCI3HAJjw+g0GsGE1b+6vhr+pu/QJAQIuAJ4o2UbP
e3qqbx+ywsdRpTuIbx6VPQ==
=792z
-END PGP SIGNATURE-


Re: wget syntax problem ?

2007-09-06 Thread Alan Thomas
Please ignore.  It was needing the \\, like Josh said.

- Original Message - 
From: Alan Thomas [EMAIL PROTECTED]
To: Josh Williams [EMAIL PROTECTED]; wget@sunsite.dk
Sent: Thursday, September 06, 2007 9:25 PM
Subject: Re: wget syntax problem ?


 Wget does not like my use of the --directory-prefix= option.  Anyone
know
 why?

 - Original Message - 
 From: Josh Williams [EMAIL PROTECTED]
 To: Alan Thomas [EMAIL PROTECTED]
 Cc: wget@sunsite.dk
 Sent: Thursday, September 06, 2007 8:53 PM
 Subject: Re: wget syntax problem ?


  On 9/6/07, Alan Thomas [EMAIL PROTECTED] wrote:
  
  
  I know this is probably something simple I screwed up, but the
 following
   commands in a Windows batch file return the error Bad command or file
 name
   for the wget command
  
   cd ..
   wget --convert-links
   --directory-prefix=C:\WINDOWS\Profiles\Alan000\Desktop\wget\CNN\
   --no-clobber http://www.cnn.com;
 
  Don't use backslashes in filenames. If you do, use `\\` instead.




Re: wget syntax problem ?

2007-09-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
I know this is probably something simple I screwed up, but the
 following commands in a Windows batch file return the error Bad command
 or file name for the wget command

It sounds to me like you don't have wget in your PATH. Make sure that
wget is located somewhere where command.com (or whatever) can find it.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG4Kki7M8hyUobTrERCCG9AJ90dQ95sGaqEwVyH7KOZQxwlL7xCQCfWeJz
v9aCRAPhJp3kqZtd6zS0KNs=
=IAsR
-END PGP SIGNATURE-


Re: wget syntax problem ?

2007-09-06 Thread Alan Thomas
Wget does not like my use of the --directory-prefix= option.  Anyone know
why?

- Original Message - 
From: Josh Williams [EMAIL PROTECTED]
To: Alan Thomas [EMAIL PROTECTED]
Cc: wget@sunsite.dk
Sent: Thursday, September 06, 2007 8:53 PM
Subject: Re: wget syntax problem ?


 On 9/6/07, Alan Thomas [EMAIL PROTECTED] wrote:
 
 
 I know this is probably something simple I screwed up, but the
following
  commands in a Windows batch file return the error Bad command or file
name
  for the wget command
 
  cd ..
  wget --convert-links
  --directory-prefix=C:\WINDOWS\Profiles\Alan000\Desktop\wget\CNN\
  --no-clobber http://www.cnn.com;

 Don't use backslashes in filenames. If you do, use `\\` instead.



Re: wget syntax problem ?

2007-09-06 Thread Josh Williams
On 9/6/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Not really; we've been Cc'ing you. I don't think we knew whether you
 were subscribed or not, and so Cc'd you in case you weren't. Also, many
 of us just habitually hit Reply All to hit the message, so we don't
 accidentally send it to the message's author only. :)

aye. Gmail doesn't have that problem, though. If it finds a duplicate
message from a mailing list, it only shows me the one from the list.
Kind of nice.


Re: Files returned by ASP

2007-09-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
 Is there a way to use wget to get file from links that result
 from Active Server Pages (ASPs) on a web page?  For example, to get the
 files in the links on the page returned by the URL
 http://www.onr.navy.mil/about/conferences/rd_partner/2007/presentations_03.asp.
  
 
 Thanks, Alan

Sure, check out what the Wget manual has to say about recursive fetching:

http://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html#Recursive-Download

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG4LAu7M8hyUobTrERCDtAAJ4ub4sh17gMv8kzK6F/p69C2HBrFQCgiLHc
zidjMSZuCQI/j0TkKxWd24M=
=kNgI
-END PGP SIGNATURE-


Announcing... The Wget Wgiki!

2007-09-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The main informational site for GNU Wget is now at
http://wget.addictivecode.com/; the Wget Wgiki.

  --

The original motivation for starting a wiki for Wget was that I needed a
forum for collaboration on specifications and design for future features
in Wget, and particularly in what we've been calling Wget 2.0, the
next generation of Wget.

Features that have been (tentatively) suggested or planned for Wget
2.0 include:

  * Support for multiple connections simultaneously
  * Configuration options on a per-host and/or per-initial URI subpath
basis.
  * Accept/reject (and others) based on MIME type.
  * Support for the use of regular expressions.
  * A recursive-fetch metadatabase, to save download information such as
mappings between local filenames and originating URIs, MIME types,
HTTP entity identifiers, etc.
  * A plugin architecture.
  * Support for parsing of non-HTML files for links to follow.
  * Support for handing-off specific HTML elements to plugins for
special handling
  * Support for extending Wget with new protocols
  * Better encapsulation of the file-system, to hide local filename
restrictions and such from the download logic.
  * Support for Internationalized Resource Identifiers (IRIs).
  * Some level of JavaScript support **
  * Support for the Metalink format **

 ** For various reasons, JavaScript and Metalink support will probably
 not be part of canonical Wget, but would take advantage of the plugin
 architecture and be distributed separately from the core Wget source.
 Development for these features might be separate from core Wget
 development.

Some of these things necessitate a complete restructuring of Wget's
logic, very possibly a complete or near-rewrite. It is also possible
that the configuration and command-line interface syntaxes would need to
be reimagined, in which case a name change for the next generation
Wget might begin to show merit.

The feature specifications and design discussions for these elements
will live at http://wget.addictivecode.org/FeatureSpecifications. I have
started a few of them off, most still need to be started, and all need
help.

 --

An aside: I do not want to give the idea that Wget is going to go from a
Swiss Army Knife to a Combination
Hand-pistol/tank/aircraft-carrier/missile-launch-silo ;)
As I see it, Wget's major boons have been its relatively small
footprint, it's speed and efficiency, and it's ability to (usually) Do
what I want. I do not wish to abandon these things. This was a major
factor in the decision to isolate features like Metalink and JavaScript
into plugins: with a plugin architecture, if the users /want/ the
Combination Hand-pistol/..., they can just load up the tank and
missile-silo modules! ;)

 --

At any rate, I felt that having a wiki for discussion of these things
would prove invaluable, so I started work on this last week. But while I
was working on these things, it became more and more obvious how much of
a benefit it could be in serving as the main repository for even
general, non-developer-oriented information for Wget. This is a somewhat
abrupt turn from my desire to make the gnu.org site the main source of
information about Wget, but I believe it'll be much easier in the long
run.

Please do check the site out, and help to improve it! Most of content
from the old site should have moved to the wiki (the old site has
already been updated to direct readers there).

  - http://wget.addictivecode.org/FeatureSpecifications
  Home for various features that need sketching out (these are
  intended to be informal specifications, not particularly rigorous;
  just enough to know what we are doing).

  - http://wget.addictivecode.org/Faq
  The FAQ has been updated somewhat, probably worth looking over.

  - http://wget.addictivecode.org/TitleIndex
  We don't have that many pages yet; here's the full list. ;)


- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG4N+E7M8hyUobTrERCHQoAKCFRB0HPbWSIBvTrT42clFlYh2p/gCfTzYH
h9HCFzSxs4WSNgyFe4OX3A8=
=0OBC
-END PGP SIGNATURE-