Re: help installing opera on ox

2008-02-09 Thread Josh Williams
On Feb 8, 2008 9:58 PM, Jacqui Lahr [EMAIL PROTECTED] wrote:
  hi .i've been trying to install opera on the olpc xo with info from
 wiki opera site
  and i get messages to contact you.iv'e tried both codes(?) with the
 tar ball and without. i have been using macs since the 512 and in my 75
 yr. old ignorance i thought i thought i could just type it in and enter
 it...not!!! i was able to upgrade the build after several attempts and
 after reading the glowing reports about opera i thought i could handle
 it..not!!! all the if onlys are driving me crazier..save me
 please..thanks,Les Lahr

I fail to understand what this has to do with GNU Wget.


Re: [PATCH] Reduce COW sections data by marking data constant

2008-01-31 Thread Josh Williams
On Jan 31, 2008 8:21 PM, Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] wrote:
 char *foo = ab - 4 + 3 = 9 bytes

How did you get 9?


Re: Redirects across hosts

2007-12-12 Thread Josh Williams
On Dec 12, 2007 1:46 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 And, what do you think about enabling that option by default when
 recursive mode is on?

Well, I think it's obvious that we need the option. But I don't think
it should be enabled by default. By default, shouldn't we want to
capture as much information as possible? IMO, wget should only be
limited by arguments explicitly invoked by the user.


Re: Wget exit codes

2007-12-09 Thread Josh Williams
On Dec 9, 2007 7:03 PM, Stuart Moore [EMAIL PROTECTED] wrote:
 Could the exit code used be determined by a flag? E.g. by default it
 uses unix convention, 0 for any success; with an
 --extended_error_codes flag or similar then it uses extra error codes
 depending on the type of success (but for sanity uses the same codes
 for failure with or without the flag)

 That should allow both of you to use it for scripting.

I like this idea.

Like Micah said, there should _NOT_ be any return value non-zero in
which the command was successful, even if it didn't download anything
(if that's what we asked it to do, then it was successful). I think it
would behoove us to have multiple return values for different
*errors*, but not for different cases of success.

I think this would be a very simple and helpful patch. (Well, simple
may be an understatement because we'd have to go through every
possible point of failure to create a return value.. mah.)


Re: Mirroring a site on the Internet Archive

2007-12-07 Thread Josh Williams
On 12/7/07, Brian [EMAIL PROTECTED] wrote:
 For the life of me, I cannot convince wget to download an old copy of a
 website from the Internet Archive. I think the url within a url is somehow
 messing it up..

  wget -e robots=off --base=
 http://web.archive.org/web/19990125085924/http://gnu.org/
 -r -Gbase
 http://web.archive.org/web/19990125085924/http://gnu.org/

 How can I get this to work?

 Cheers,
 Brian

Hey!

We've seen this issue a lot. IIRC, the --base option does no good in
this instance because the problem is actually a parsing error.

I hacked around it a bit once, and I was able to make it download the
files, but in an extremely funky directory hierarchy - so horrible
that I couldn't even find the files I wanted.

The fact of the matter is that wget (in its current state) cannot
handle archive.org websites. Sorry :-(

You're welcome to have a go at the code to see if you can figure it
out, though :-)


Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams
On Nov 29, 2007 6:20 PM, David Ginger
[EMAIL PROTECTED] wrote:
 So can I ask is a wget2 actualy being developed ?

Go ahead, but I'll answer that question before you do ;-)

The answer is no - not at the moment. But we've been discussing it for
several months. It will be a while before any code is actually
written.


Re: Wget Name Suggestions

2007-11-29 Thread Josh Williams
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote:
 A new discussion page on the wiki:

 http://wget.addictivecode.org/Wget2Names

 (Does it sound a bit too much like something that extracts names from
 wget output? :) )

I really like the name `fetch` because it does what it says it does.
It's more UNIX-like than the other names :-)


Re: wget2

2007-11-29 Thread Josh Williams
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote:
   - Alan has prior history on this list. Check the archives:

yeah, I remember him. And is it just me, or does it seem that
something's going to go down tonight with wget 2? ;-)


Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Yeah... of course they won't be able to edit the wiki that way.

I doubt you'd get the slashdot effect from just the people who're
interested in editing the wiki. You may get a handful of developers
and a few thousand people who only want to read it :-)


Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Well, the trouble with that is that I'm running all of Wget's stuff
 (plus my own personal mail and whatnot) on a little VPS. I'm rather
 concerned that the traffic will kill me. I'm already worried about it
 potentially hitting SlashDot or Digg because it's the first Wget release
 in quite a while. D:

Tada! http://en.wikipedia.org/wiki/Coral_Content_Distribution_Network

There's also archive.org.


Re: wget2

2007-11-29 Thread Josh Williams
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Well don't look at _me_; I'm not the one who brought it up! ;)

heh. I wasn't looking for some grand unveiling. It just seems that it
seems to be attracting a lot of attention, and we should probably
start putting more effort into it.

I'm going to re-read some of the current Wget code tonight and start
playing around with my own attempts of a wget2. I think we should
simplify the name for this release to something like the `fetch`
command (which is available, btw ;-).


Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote:
 I dunno, man, I think our current wget2 roadmap goals are already pretty
 wild-and-crazy. ;)

I agree. I think we should create an announcement asking for
developers to help and submit it to digg and slashdot. The new
features may get some excitement going and start rumors. :-P

^^ in all seriousness ^^


Re: wget2

2007-11-29 Thread Josh Williams
On 11/29/07, Alan Thomas [EMAIL PROTECTED] wrote:
 Sorry for the misunderstanding.  Honestly, Java would be a great language
 for what wget does.  Lots of built-in support for web stuff.  However, I was
 kidding about that.  wget has a ton of great functionality, and I am a
 reformed C/C++ programmer (or a recent Java convert).  But I love using
 wget!

I vote we stick with C. Java is slower and more prone to environmental
problems. Wget needs to be as independent as we can possibly make it.
A lot of the systems that wget is used on (including mine) do not even
have Java installed. That would be a HUGE requirement for many people.


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Josh Williams
On 11/4/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Christian Roche has submitted a revised version of a patch to modify the
 unique-name-finding algorithm to generate names in the pattern
 foo-n.html rather than foo.html.n. The patch looks good, and will
 likely go in very soon.

That's something I had meant to submit a bug report for a while back,
but somehow never found the time to do it. I guess it wasn't my top
priority since GNU/Linux is usually smart enough to ignore the file
extensions anyways.

 A couple of minor detail questions: what do you guys think about using
 foo.n.html instead of foo-n.html? And (this one to Gisle), how would
 this naming convention affect DOS (and, BTW, how does the current one
 hold up on DOS)?

Well, this problem is  mainly for win32 users, so I think we need to
keep sloppy coding in mind. It's been my experience that *man* win32
programs will treat everything after the first period as the file
extension.

Honestly, I don't see any reason to risk the annoyance of these kinds
of bugs. Just go with the dash.

(On a side note, have you thought of running FreeDOS in a virtual machine?)


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Josh Williams
On 11/4/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 It just occurred to me that this change breaks backward compatibility.
 It will break scripts that try to clean up after Wget or that in any
 way depend on the current naming scheme.


You mean the scripts that fix the same problem this patch does? ;-)


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-26 Thread Josh Williams
On 10/26/07, Micah Cowan [EMAIL PROTECTED] wrote:
 And, of course, when I say there would be two Wgets, what I really
 mean by that is that the more exotic-featured one would be something
 else entirely than a Wget, and would have a separate name.

I think the idea of having two Wgets is good. I too have been
concerned about the resources required in creating the all-out version
2.0. The current code for Wget is a bit mangled, but I think the basic
concepts surrounding it are very good ones. Although the code might
suck for those trying to read it, I think it could be very great with
a little regular maintenance.

There still remains the question, though, of whether version 2 will
require a complete rewrite. Considering how fundamental these changes
are, I don't think we would have much of a choice. You mentioned that
they could share code for recursion, but I don't see how. IIRC, the
code for recursion in the current version is very dependent on the
current methods of operation. It would probably have to be rewritten
to be shared.

As for libcurl, I see no reason why not. Also, would these be two
separate GNU projects? Would they be packaged in the same source code,
like finch and pidgin?

I do believe the next question at hand is what version 2's official
mascot will be. I purpose Lenny the tortoise ;)

   _  ..
Lenny -  (_\/  \_,
'uuuu~'


Re: subscribing from this list

2007-10-15 Thread Josh Williams
On 10/15/07, patrick robinson [EMAIL PROTECTED] wrote:
 Hello,

 I want to unsubscripe from this list but lost my registration e-mail.
 How is this performed?

You can find this (and other information) on the Wget wiki.
http://wget.addictivecode.org/

To unsubscribe from a list, send an email to
[EMAIL PROTECTED] For more information on list
commands, send an email to [EMAIL PROTECTED]


Re: subscribing from this list

2007-10-15 Thread Josh Williams
On 10/15/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Note that this doesn't help him much if he's lost his registration e-mail.

 Patrick, you'll probably have to go bug the staff at www.dotsrc.org, who
 hosts this list; send an email to [EMAIL PROTECTED]

E-mail *address* or just the e-mail? I don't see how having the e-mail
is important.


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Josh Williams
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
 OK, so let's go back to basics for a moment.

 wget's default behavior is to use all available bandwidth.

 Is this the right thing to do?

 Or is it better to back off a little after a bit?

 Tony

IMO, this should be handled by the operating system, not the
individual applications. That's one of the reasons I believe this
should be a module instead, because it's more or less a hack to patch
what the environment should be doing for wget, not vice versa.

In my experience, GNU/Linux tends to consume all the resources
unbiasedly, seemingly on a first come first serve *until you're
done* basis. This should be brought to the attention of the LKML.

However, other operating systems do not seem to have this problem as
much. Even Windows networks seem to prioritise packets.

This is a problem I've been having major headaches with lately. It
would be nice if wget had a patch for this problem, but that would not
solve the problem of my web browser or sftp client consuming all the
network resources.


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Josh Williams
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
 Well, you may have such problems but you are very much reaching in
 thinking that my --linux-percent has anything to do with any failing
 in linux.

 It's about dealing with unfair upstream switches, which, I'm quite
 sure, were not running Linux.

 Let's not hijack this into a linux-bash.

I really don't know what you were trying to say here. I use GNU/Linux.


Re: WGET Negative Counter Glitch

2007-10-13 Thread Josh Williams
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Hi Joshua,

 There is a very strong likelihood that this has been fixed in the
 current development version of Wget. Could you try with that?

 If you're a Windows user, you can get a binary from
 http://www.christopherlewis.com/WGet/WGetFiles.htm; otherwise, you'd
 need to compile from the repositories source:
 http://wget.addictivecode.org/RepositoryAccess

I believe you're right. IIRC, this issue was closed on Savannah a
couple months ago. I'd find the ticket number, but I don't have time
ATM.

Good luck, Joshua! :-)


Re: working on patch to limit to percent of bandwidth

2007-10-12 Thread Josh Williams
On 10/12/07, Tony Godshall [EMAIL PROTECTED] wrote:
 Again, I do not claim to be unobtrusive.  Merely to reduce
 obtrusiveness.  I do not and cannot claim to be making wget *nice*,
 just nicER.

 You can't deny that dialing back is nicer than not.

Personally, I think this is a great idea. But I do agree that the
documentation is a bit messy right now (as well as the code). If this
doesn't make it into the current trunk, I think it'd make a great
module in version 2.


Re: working on patch to limit to percent of bandwidth

2007-10-12 Thread Josh Williams
On 10/12/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 Personally I don't see the value in attempting to find out the
 available bandwidth automatically.  It seems too error prone, no
 matter how much heuristics you add into it.  --limit-rate works
 because reading the data more slowly causes it to (eventually) also be
 sent more slowly.  --limit-percentage is impossible to define in
 precise terms, there's just too much guessing.

Yeah, that is a good point. Hence, I vote for it to become a module.


Re: working on patch to limit to percent of bandwidth

2007-10-08 Thread Josh Williams
On 10/8/07, A. P. Godshall [EMAIL PROTECTED] wrote:
 Anyhow, does this seem like something others of you could use?  Should
 I submit the patch to the submit list or should I post it here for
 people to hash out any parameterization niceties etc first?

Go ahead and send it on here so we can comment on the code :-)


Re: bug in escaped filename calculation?

2007-10-04 Thread Josh Williams
On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote:
 I would have sent a fix too, but after finding my way through http.c 
 retr.c I got lost in url.c.

You and me both. A lot of the code needs re-written.. there's a lot of
spaghetti code in there. I hope Micah chooses to do a complete
re-write for version 2 so I can get my hands dirty and understand the
code better.


Re: Wrong log output for wget -c

2007-09-13 Thread Josh Williams
On 9/9/07, Jochen Roderburg [EMAIL PROTECTED] wrote:

 Hi,

 This is now an easy case for a change  ;-)

 In the log output for wget -c we have the line:

The sizes do not match (local 0) -- retrieving.

 This shows always 0 as local size in the current svn version.

 The variable which is printed here is local_size which is initialized to 0 
 and
 used nowhere else. I think this variable was just forgotten on a recent code
 reorganization. Comparing an old version with the current I think the
 information is now in hstat.orig_file_size, I attach my little patch for
 this.

 I have also seen another much more complicated and rare log output problem 
 with
 restarted requests, but so far I was not able to reconstruct a real-life
 example for it again. It happens when on multiple retries the Range request
 is not hnoured by the server and transfer starts again at byte 0. It looked
 like not all variables for the display of the progress bar are correctly
 adjusted to this situation. I'll keep on trying  ;-)

Hi! Thanks for your contribution. I just looked over your patch and it
looks good. I've committed the changes to:
svn://[EMAIL PROTECTED]/wget/branches/bugs/b21057

After Micah (the maintainer) inspects it, it should go right into the
trunk. Thanks!


Re: forum download, cookies?

2007-09-13 Thread Josh Williams
On 9/12/07, Juhana Sadeharju [EMAIL PROTECTED] wrote:

 A forum has topics which are available only for members.
 How to use wget for downloading copy of the pages in that
 case? How to get the proper cookies and how to get wget to
 use them correctly? I use IE in PC/Windows and wget in
 a unix computer. I could use Lynx in the unix computer
 if needed.

 (PC/Windows has Firefox but I cannot install anything new.
 If Firefox has a downloader plugin suitable for forum
 downloading, that would be ok.)

 Juhana

Firefox stores a cookies.txt file in the profile directory. In
Windows, I believe this is located in C:/Documents and
Settings/{username}/Application
Data/Mozilla/firefox/profiles/PROFILE/cookies.txt.

GNU Wget is compatible with this cookies file. Just use the
`--load-cookies file` option.


Re: Abort trap

2007-09-13 Thread Josh Williams
On 9/11/07, Hex Star [EMAIL PROTECTED] wrote:
 When I try to execute the command (minus quotes) wget -P ftp.usask.ca -r
 -np -passive-ftp ftp://ftp.usask.ca/pub/mirrors/apple/;
 wget works for a bit and then terminates with the following error:

 xmalloc.c:186: failed assertion `ptr !=NULL'
 Abort trap

 What causes this error? What does this error mean? Is this due to a server
 misconfiguration? Thanks! :)

 P.S. I am not subscribed to this list, please cc all replies to me...thanks!
 :)

failed assertion means that at some point along the line, one of the
variables's value was not what it should have been.

I'll check into it. Thanks!


Re: Wget automatic download from RSS feeds

2007-09-13 Thread Josh Williams
On 9/12/07, Erik Bolstad [EMAIL PROTECTED] wrote:
 Hi!
 I'm doing a master thesis on online news at the University of Oslo,
 and need a software that can download html pages based on RSS feeds.

 I suspect that Wget could be modified to do this.

 - Do you know if there are any ways to get Wget to read RSS files and
 download new files every hour or so?
 - If not: Have you heard about software that can do this?

 I am very grateful for all help and tips.

Wget does not do this. That would be a great feature, but I don't
believe parsing the RSS feed is Wget's job. Wget just fetches the
files.

I recommend you look for a program that simply parses the RSS feed and
dumps the URLs to a file for Wget to fetch. Piping.. that's what UNIX
is all about ;-)

I don't have any recommendations, unfortunately. If you aren't able to
find one, let me know, and I'll try to come up with one.

Josh


Re: Abort trap

2007-09-13 Thread Josh Williams
On 9/13/07, Hex Star [EMAIL PROTECTED] wrote:
 wget 1.9+cvs-dev

Try it in either the latest release or (preferably) the subversion
trunk and let us know if you still have the same problem. The version
you're using is an old trunk version, so we can safely assume that it
has plenty of fixed bugs anyways.


Re: Announcing... The Wget Wgiki!

2007-09-07 Thread Josh Williams
On 9/7/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Doh! Of course, it's .org. Fortunately all the other links, including
 the ones from the site at gnu.org, seem to be correct.

Unfortunately for you, your typo is now an official piece of free
software history! :D

Just poking. :-P


Re: wget syntax problem ?

2007-09-06 Thread Josh Williams
On 9/6/07, Alan Thomas [EMAIL PROTECTED] wrote:


I know this is probably something simple I screwed up, but the following
 commands in a Windows batch file return the error Bad command or file name
 for the wget command

 cd ..
 wget --convert-links
 --directory-prefix=C:\WINDOWS\Profiles\Alan000\Desktop\wget\CNN\
 --no-clobber http://www.cnn.com;

Don't use backslashes in filenames. If you do, use `\\` instead.


Re: wget syntax problem ?

2007-09-06 Thread Josh Williams
On 9/6/07, Micah Cowan [EMAIL PROTECTED] wrote:
 Not really; we've been Cc'ing you. I don't think we knew whether you
 were subscribed or not, and so Cc'd you in case you weren't. Also, many
 of us just habitually hit Reply All to hit the message, so we don't
 accidentally send it to the message's author only. :)

aye. Gmail doesn't have that problem, though. If it finds a duplicate
message from a mailing list, it only shows me the one from the list.
Kind of nice.


Re: wget ignores --user and --password if you have a .netrc

2007-09-04 Thread Josh Williams
On 9/3/07, Andreas Kohlbach [EMAIL PROTECTED] wrote:
 Hi,

 though the man page of wget mentions .netrc, I assume this is a bug.

 For my understanding if you provide a --user=user and --password=password
 at the command line this should overwrite any setting elsewhere, as in
 the .netrc. It doesn't. And it took me quite some time and bothering
 other guys to realise that it seems wget is ignoring --user and
 --password at the command line if a .netrc exists with the matching
 content.

Indeed. Whether this is a bug or not needs some discussion, I think,
but here's a patch to fix your problem.

Index: src/netrc.c
===
--- src/netrc.c (revision 2376)
+++ src/netrc.c (working copy)
@@ -59,6 +59,7 @@
 search_netrc (const char *host, const char **acc, const char **passwd,
   int slack_default)
 {
+  if (strlen(opt.user)  strlen(opt.passwd)) return;
   acc_t *l;
   static int processed_netrc;


Re: Fix for Warning C4142 in windows

2007-09-01 Thread Josh Williams
On 9/2/07, Christopher G. Lewis [EMAIL PROTECTED] wrote:
  Warning_C4142_Fix.diff

 Windows added support of intptr_t and uintptr_t with Visual Studio 2003
 (MSVER 1310)

 This patch removes 60+ warnings from the MSWindows build

Holy crap, those're a lot of warnings for that small patch. Thanks!


Re: I can download with a browser, but not with wget

2007-08-23 Thread Josh Williams
On 8/23/07, Micah Cowan [EMAIL PROTECTED] wrote:
 --user-agent Mozilla does the trick. Apparently Intel's website does
 not like wget. :)

Stinky buzzards. What did we ever do to them?


Re: url.c (in_url_list_p): why bool verbose?

2007-08-22 Thread Josh Williams
On 8/22/07, Josh Williams [EMAIL PROTECTED] wrote:
 In src/url.c, function in_url_list_p, there is an  argument called
 bool verbose, but it is never used. Furthermore, the verbose option
 is defined in our options struct.

 Should this argument be removed?

Below is a patch of this change.

Index: src/spider.c
===
--- src/spider.c(revision 2336)
+++ src/spider.c(working copy)
@@ -67,7 +67,7 @@
 };

 static bool
-in_url_list_p (const struct url_list *list, const char *url, bool verbose)
+in_url_list_p (const struct url_list *list, const char *url)
 {
   const struct url_list *ptr;

@@ -100,7 +100,7 @@
   list-url = referrer ? xstrdup (referrer) : NULL;
   hash_table_put (visited_urls_hash, xstrdup (url), list);
 }
-  else if (referrer  !in_url_list_p (list, referrer, false))
+  else if (referrer  !in_url_list_p (list, referrer))
 {
   /* Append referrer at the end of the list */
   struct url_list *newnode;


Re: url.c (in_url_list_p): why bool verbose?

2007-08-22 Thread Josh Williams
On 8/22/07, Micah Cowan [EMAIL PROTECTED] wrote:
 This looks like very reasonable, Josh. Feel free to check this change
 directly into the trunk (with a note in src/ChangeLog).

That I will, when I get home tonight. The stupid network at the
college is blocking subversion. I'm going to have to come up with some
sort of proxy or something, because this is really bugging the
bejebers out of me.

Do you want this in the main trunk?


Re: -R and HTML files

2007-08-22 Thread Josh Williams
On 8/22/07, Micah Cowan [EMAIL PROTECTED] wrote:
 What would be the appropriate behavior of -R then?

I think the default option should be to download the html files to
parse the links, but it should discard them afterwards if they do not
match the acceptance list.

But, as you stated, I believe that the user _should_ be given the choice.


Re: --spider requires --recursive

2007-08-18 Thread Josh Williams
On 8/18/07, Micah Cowan [EMAIL PROTECTED] wrote:
 I'm not convinced. To me, the name spider implies recursion, and it's
 counter-intuitive for it not to.

 As to wasted functionality, what's wrong with -O /dev/null (or NUL or
 whatever) for simply checking existence?

I see his point. The difference is that the --spider option will only
look for broken links on a given page, such as the bookmarks.html
example. If we were to force recursion, it would finger out across the
different pages. Perhaps we only want to check the links on _that_one_
page. Recursion wouldn't be helpful in that instance.

But it could be argued that you could just set the recursion level to
zero (or is it one?) to prevent that behavior.


--spider requires --recursive

2007-08-17 Thread Josh Williams
Is there any particular reason the --spider option requires
--recursive? As it is now, we run into the following error if we omit
--recursive:

[EMAIL PROTECTED]:~/cprojects/wget/src$ ./wget
http://www.google.com --spider
Spider mode enabled. Check if remote file exists.
--00:37:21--  http://www.google.com/
Resolving www.google.com... 209.85.165.147, 209.85.165.104, 209.85.165.99, ...
Connecting to www.google.com|209.85.165.147|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]
Remote file exists but recursion is disabled -- not retrieving.

[EMAIL PROTECTED]:~/cprojects/wget/src$

The only explanation I can think of is that it serves the purpose of
checking whether a file exists.

Should --spider imply --recursive?


Re: Manual and --help difference

2007-08-02 Thread Josh Williams
On 8/2/07, dmitry over [EMAIL PROTECTED] wrote:
 Hi,

 In `man wget`  is see text
 ---[ cut ]---
  --http-user=user
--http-password=password
 [..]
 but in `wget --help` is see

 --http-user=USER  set http user to USER.
 --http-passwd=PASSset http password to PASS.

 check --http-passwd and --http-password and fix it please.

What version of wget are you using? I don't see this problem in 1.10.2
_or_ in the trunk.


Re: patch: prompt for password

2007-07-25 Thread Josh Williams

On 7/25/07, Matthew Woehlke [EMAIL PROTECTED] wrote:

Any reason you're not replying to the list? (Unless there is, please
direct replies to the list.)


No, I was in a hurry at the time and forgot to change the e-mail
address before I sent it.


I personally *must have* this patch; storing my login password in a file
is not acceptable ;-), and I have a script that needs to use wget. Said
script is already interactive, so asking for a password is not an issue
in this context. Micah has already stated that he intends to add this
functionality eventually, but if he didn't I would be forced to
perpetually maintain this patch on my own, or else... use something
similar to wget that is not wget :-).


We've been discussing optional user interaction a lot lately, but we
haven't decided how to go about it yet. Even so, I'm sure your patch
may come in handy.

If you're interested, we could use someone to help develop an
*optional* user interface (browse the list archives if you need some
ideas).


Re: ignoring robots.txt

2007-07-18 Thread Josh Williams

On 7/18/07, Maciej W. Rozycki [EMAIL PROTECTED] wrote:

 There is no particular reason, so we do.


As far as I can tell, there's nothing in the man page about it.


Re: Why --exclude-directories, and not --exclude-paths?

2007-07-17 Thread Josh Williams

On 7/17/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:

-R allows excluding files.  If you use a wildcard character in -R, it
will treat it as a pattern and match it against the entire file name.
If not, it will treat it as a suffix (not really an extension, it
doesn't care about . being there or not).  -X always excludes
directories and allows wildcards.

It was supposed to be a DWIM thing.


I wrote a patch to add an option --exclude-files. (That was me he was
talking about, btw.) It may not be *technically* necessary at this
point since -R allows it, but this is more precise, and its job is
more clearly defined.

I haven't committed it to the svn yet, but you can see it here:
https://savannah.gnu.org/bugs/?20454


Re: Maximum 20 Redirections HELP!!!

2007-07-16 Thread Josh Williams

On 7/16/07, Jaymz Goktug YUKSEL [EMAIL PROTECTED] wrote:

Hello everyone,

Is there a command to override the
maximum redirections?


Attached is a patch for this problem. Let me know if you have any
problems with it. It was written for the latest trunk in the svn, so
you *may* have to compile an unstable release.

You can browse the source (with this patch) at
http://addictivecode.org/svn/wget/branches/bugs/b20499/. To download
it, run:

svn co svn://addictivecode.org/wget/branches/bugs/b20499 wget-maxredirect

To compile, run:

./autogen.sh
./configure
make

As this is an unstable release, you probably do not wish to install
it, but to run it from ./src/wget with the --max-redirect option.
Index: src/options.h
===
--- src/options.h	(revision 2280)
+++ src/options.h	(working copy)
@@ -38,6 +38,8 @@
   bool recursive;		/* Are we recursive? */
   bool spanhost;			/* Do we span across hosts in
    recursion? */
+  int  maxredirect;/* Maximum number of times we'll allow
+   a page to redirect. */
   bool relative_only;		/* Follow only relative links. */
   bool no_parent;		/* Restrict access to the parent
    directory.  */
Index: src/init.c
===
--- src/init.c	(revision 2280)
+++ src/init.c	(working copy)
@@ -182,6 +182,7 @@
   { loadcookies,	opt.cookies_input,	cmd_file },
   { logfile,		opt.lfilename,		cmd_file },
   { login,		opt.ftp_user,		cmd_string },/* deprecated*/
+  { maxredirect,	opt.maxredirect,	cmd_number_inf },
   { mirror,		NULL,			cmd_spec_mirror },
   { netrc,		opt.netrc,		cmd_boolean },
   { noclobber,	opt.noclobber,		cmd_boolean },
Index: src/retr.c
===
--- src/retr.c	(revision 2280)
+++ src/retr.c	(working copy)
@@ -567,13 +567,7 @@
   return dlrate;
 }
 
-/* Maximum number of allowed redirections.  20 was chosen as a
-   reasonable value, which is low enough to not cause havoc, yet
-   high enough to guarantee that normal retrievals will not be hurt by
-   the check.  */
 
-#define MAX_REDIRECTIONS 20
-
 #define SUSPEND_POST_DATA do {			\
   post_data_suspended = true;			\
   saved_post_data = opt.post_data;		\
@@ -746,10 +740,10 @@
   mynewloc = xstrdup (newloc_parsed-url);
 
   /* Check for max. number of redirections.  */
-  if (++redirection_count  MAX_REDIRECTIONS)
+  if (++redirection_count  opt.maxredirect)
 	{
 	  logprintf (LOG_NOTQUIET, _(%d redirections exceeded.\n),
-		 MAX_REDIRECTIONS);
+		 opt.maxredirect);
 	  url_free (newloc_parsed);
 	  url_free (u);
 	  xfree (url);
Index: src/main.c
===
--- src/main.c	(revision 2280)
+++ src/main.c	(working copy)
@@ -189,6 +189,7 @@
 { level, 'l', OPT_VALUE, reclevel, -1 },
 { limit-rate, 0, OPT_VALUE, limitrate, -1 },
 { load-cookies, 0, OPT_VALUE, loadcookies, -1 },
+{ max-redirect, 0, OPT_VALUE, maxredirect, -1 },
 { mirror, 'm', OPT_BOOLEAN, mirror, -1 },
 { no, 'n', OPT__NO, NULL, required_argument },
 { no-clobber, 0, OPT_BOOLEAN, noclobber, -1 },
@@ -497,6 +498,8 @@
 N_(\
--header=STRING insert STRING among the headers.\n),
 N_(\
+   --max-redirect  maximum redirections allowed per page.\n),
+N_(\
--proxy-user=USER   set USER as proxy username.\n),
 N_(\
--proxy-password=PASS   set PASS as proxy password.\n),
Index: ChangeLog
===
--- ChangeLog	(revision 2280)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2007-07-16  Joshua David Williams [EMAIL PROTECTED]
+
+	* Added new option --max-redirect
+
 2007-07-09  Micah Cowan  [EMAIL PROTECTED]
 
 	* README, util/wget.spec: Removed references to wget.sunsite.dk.


Re: -nd not working as I would expect.

2007-07-16 Thread Josh Williams

On 7/16/07, Dax Mickelson [EMAIL PROTECTED] wrote:

I've read the man page about 10 times now and I'm sure this issue is my
own stupidity but I can't see where or how.
[..]
Thus I would expect to get a directory full of index.html.n files along
with a bunch of .zip files!  Alas, all I get is:


You have quite a few unnecessary (and repetitive) options which I have
omitted. There are too many to mention in detail, so please take note
of these for future reference (and rtfm :-).

I don't have time to walk you through it right now, unfortunately, but
here's the command you need:

wget http://librivox.org/ --output-file logs --progress=dot
--no-directories --recursive --level=100 -Aindex.html*,*zip*
-Dlibrivox.org,archive.org,www.archive.org --span-hosts --follow-ftp


Re: -nd not working as I would expect.

2007-07-16 Thread Josh Williams

On 7/16/07, Dax Mickelson [EMAIL PROTECTED] wrote:

Thanks for the quick reply.  I truly did RTFM (or at least RTF'Man').
Sorry for the dumb question and I knew it must be me but I just couldn't
see it.  I'm running the file now and it is looking good so far!


Nah, it wasn't a dumb question. To be honest, it took me quite a while
to get that one working.

Cheers!


Re: Maximum 20 Redirections HELP!!!

2007-07-16 Thread Josh Williams

On 7/16/07, Jaymz Goktug YUKSEL [EMAIL PROTECTED] wrote:

Hey Josh,

Thank you very much for that patch, this was what I was looking for, I think
this is going to solve my problem!

Thank you vary much, and have a good one!

Cordially,
James


You're welcome :-)

Let me know how it turns out. The only testing I did on it was
checking to make sure my code compiled; I haven't actually tried the
option.


Re: Maximum 20 Redirections HELP!!!

2007-07-16 Thread Josh Williams

On 7/17/07, Tony Lewis [EMAIL PROTECTED] wrote:

Just forward the patch to [EMAIL PROTECTED] and let them test it. :-)


Hmm. .org, maybe?


Delivery to the following recipient failed permanently:

[EMAIL PROTECTED]

Technical details of permanent failure:
PERM_FAILURE: DNS Error: Domain name not found


Re: bug and patch: blank spaces in filenames causes looping

2007-07-15 Thread Josh Williams

On 7/15/07, Rich Cook [EMAIL PROTECTED] wrote:

I think you may well be correct.  I am now unable to reproduce the
problem where the server does not recognize a filename unless I give
it quotes.  In fact, as you say, the server ONLY recognizes filenames
WITHOUT quotes and quoting breaks it.  I had to revert to the non-
quoted code to get proper behavior.  I am very confused now.  I
apologize profusely for wasting your time.  How embarrassing!

I'll save this email, and if I see the behavior again, I will provide
you with the details you requested below.


I wouldn't say it was a waste of time. Actually, I think it's good for
us to know that this problem exists on some servers. We're considering
writing a patch to recognise servers that do not support spaces. If
the standard method fails, then it will retry as an escaped character.

Nothing has been written for this yet, but it has been discussed, and
may be implemented in the future.


--base does not consider references to root directory

2007-07-14 Thread Josh Williams

Consider this example, which happens to be how I realised this problem:

wget http://www.mxpx.com/ -r --base=.

Here, I want the entire site to be downloaded with each link pointing
to the local file. This works for some links, but it does not take
references to the root directory into account, such as this:

a href=/index.phpHome/a

Here, wget just ignores the --base parameter and leaves the link as
/index.php.

I realise that this may seem like a sticky situation, but consider
this solution: Let's say that I have a photo album on my personal
homepage with the following directory scheme:

/
/photos/
/photos/hawaii
/photos/concerts

In /photos/concerts/index.html, I have a link to /index.html. When
wget parses the html, it could then become: ../../index.html. All we
need to know is how many directories deep we are.

Would this be an acceptable solution? If so, I'd be glad to write a patch.


Re: --base does not consider references to root directory

2007-07-14 Thread Josh Williams

On 7/14/07, Matthias Vill [EMAIL PROTECTED] wrote:

So you would suggest handling in the way that when I use
wget --base=/some/serverdir http://server/serverdir/
/.* will be interpreted as /some/.* so if you have a link like
/serverdir/ it would go back to /some/serverdir, right?


Correct.


I guess this would be ok. Just one question if there is a Link back to
/serverdir/ and base is something like /my/dir/ shouldn't this also be
fetched from inside /my/dir/ and not /my/serverdir/?


Take a look at the directory structure:

/my/dir
/my/dir/www.foo.bar
/my/dir/www.foo.bar/serverdir

Suppose we have a link in /my/dir/www.foo.bar/serverdir like this:

a href=/jobs.phpJobs/a

This link (if called locally) would try to fetch a file on the root
directory of the operating system, not the website. It would probably
get a 403 or a 404 error. What we would want it to look like is this:

a href=../jobs.phpJobs/a

This method will work no matter what the --base parameter is.


Re: --base does not consider references to root directory

2007-07-14 Thread Josh Williams

On 7/14/07, Matthias Vill [EMAIL PROTECTED] wrote:

I think I got your point:
Now i think this could result in different problems like what schould
happen with wget -r --base=/home/matthias/tmp
http://server/with/a/complicated/structure/and/to/many/dirs/a.php;

If you now have a link to /index.html you would try to access some
file above / or am I wrong?


In the case of 
http://server/with/a/complicated/structure/and/to/many/dirs/a.php,
a link to /index.php would look like this:

a href=../../../../../../../../index.phpHome/a

(Assuming I counted it correctly.) It's just a matter of knowing how
many directories deep we are so we know how many times to concatenate
the ../


--delete-after and --spider should not create (and leave) directories

2007-07-12 Thread Josh Williams

It has come to my attention that --delete-after and --spider leave
empty directories when they have finished. IMHO, we should force
--no-directories since we're not leaving any of the files we're
downloading.

I have submitted a patch here - https://savannah.gnu.org/bugs/index.php?20466

Do any of you have any objections to this change?