Wget gnulib-ized

2007-10-14 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Mainline now has replaced a few of Wget's portability pieces with
corresponding gnulib modules. This has resulted in significant changes
to what needs to be built where, so non-Unix builds are probably further
broken (...sorry, Chris, Gisle... *'n'*). Various Unix builds may
possibly have been broken as well; hopefully it'll come out in testing.

The pieces replaced were, I think, old code culled from libiberty or
otherwise from the "GNU collective pool": gnu-md5 (now md5), getopt,
safe-ctype (now c-ctype). stdint.h and stdbool.h detection/replacement
were pulled in automatically through importing those modules, but I
haven't altered the build setup to use those instead of our own builtin
stuff yet.

So, at the moment, I've just introduced tremendous instability to
mainline with the only benefit being mildly updated equivalents to about
three files from the GNU collective. ^_^

However, I expect the payoff in the long run to be worth it, as I can
now more easily take advantage of other modules gnulib offers. I expect
that the inline module could be handy for taking advantage of build
environments that offer inlined functions, and of course getpass will be
useful (though we may need to special-case our handling of that one);
the quote (for dealing with strange characters when quoting, say,
filenames) and regex (same thing Emacs uses, I believe--for the proposed
regex support in -A, -R and the like) modules are also possibilities.
And, especially, there are several ADTs that I expect that I will need
shortly, in applications where string-hashes may not fill the need.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHEvME7M8hyUobTrERCMj0AJ9aKGdqCrz9SCuK31kl3dupJAbY9QCcCsJC
FE9In1CKb6xs1xYD2qoRcAk=
=V1sa
-END PGP SIGNATURE-


Re: Version tracking in Wget binaries

2007-10-14 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Christopher G. Lewis wrote:
> OK, so I'm trying to be open minded and deal with yet another version
> control system.
> 
> I've "cloned" the repository and built my mainline.  I do not
> "autogenerate" a version.c file in windows.  Build fails missing
> version.obj.  

Right; I think I mentioned that would happen.

> Note that in the windows world, we use Nmake from the MSVC install - no
> GNU tools required.

Right; and I don't expect that you'll be able to do it exactly as I've
done. However, the contents of src/Makefile.am should give good hints
about how it could be done in Nmake. AFAIK, the only thing Unix-specific
about the rules as I've done them, in fact, is the use of the Unix "cut"
command. If absolutely necessary, that part could be removed, with the
Nmake rules similar to:

hg-id: $(OBJS)
-hg id > $@

It's just that only the first word is needed.

> An aside on Hg...
> 
> Confirm for me that I basically need to do the following:
> 
> Create a clone repository:
>   hg clone http://hg.addictivecode.org/wget/mainline

Approximate equivalent to svn co.

> Get any changes from mainline into my clone 
>   hg pull http://hg.addictivecode.org/wget/mainline

Equivalent to svn up.

> Make my src changes, create a "changeset"... And then I'm lost...

Alright, so you can make your changes, and issue an "hg diff", and
you've basically got what you used to do with svn.

Or, if they're larger changes, you can run "hg ci" periodically as you
change, to "save progress" so to speak.

> And as a follow-up question - what does Hg get you above and beyond CVS
> or SVN?  I kind of get the non-centralized aspect of repositories and
> clones, but I don't understand how changesets and tips work.

Well, changesets are in all SCMs, as far as I know. A changeset is just
the set-of-changes that you check in when you do "svn ci" or "hg ci".
Every revision id corresponds to and identifies a changeset.

"tip" is just the Mercurial equivalent of Subversion's "HEAD". In
Mercurial, the "tip" is always the very last revision made, whereas
"heads" are the last revision made to each unclosed branch in a repository.

> My thoughts are that there is *one* source of the code (with histories)
> regardless of SVN, Hg or whatever.

One official one, sure.

For me, the major advantages are that I can be working on several
things, each with history, without touching the official repository. I
can work on large changes while I'm in my car while my wife drives the
family out-of-town, without having to worry about screwing something up
that I can't back up to a good point (other than back to the last
"official" point in the repo, or whatever I had the foresight to "cp
- -r"). And, I can check in changes where each time it takes a hair of a
second, and then push it all over the net when I'm ready for it to be
sent, instead of taking several seconds for each commit. Believe me, you
begin to appreciate that after a few times.

Admittedly, these advantages are mainly advantages to pretty active
developers, which, at the moment, is pretty much just me. :) I've
definitely found use of a DVCS to be absolutely awesome for my purposes.

>   Hg's concept of multiple clones and repositories is quite interesting,
> but doesn't feel right for the remote, non-connected group of developers
> that wget gathers input from.  If we were all behind a firewall or could
> share out each user's repository, it might make more sense, but I (for
> one) wouldn't be able to share my repository (NAT'd, firewalled,
> corporate desktop), so I just don't get it.

Sharing is a potentially useful aspect of DVCSses, to be sure, but it's
not all it's got going for it, and in fact isn't really the reason I
made the move.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHEugL7M8hyUobTrERCCM/AJ47cwY0rm0FBsEKH6PhKLwFiyTrxgCfasIY
GJiUAR8s7rX09O2F9ZIt4uQ=
=COwb
-END PGP SIGNATURE-


RE: Version tracking in Wget binaries

2007-10-14 Thread Christopher G. Lewis
OK, so I'm trying to be open minded and deal with yet another version
control system.

I've "cloned" the repository and built my mainline.  I do not
"autogenerate" a version.c file in windows.  Build fails missing
version.obj.  

Note that in the windows world, we use Nmake from the MSVC install - no
GNU tools required.

An aside on Hg...

Confirm for me that I basically need to do the following:

Create a clone repository:
  hg clone http://hg.addictivecode.org/wget/mainline

Get any changes from mainline into my clone 
  hg pull http://hg.addictivecode.org/wget/mainline

Make my src changes, create a "changeset"... And then I'm lost...

And as a follow-up question - what does Hg get you above and beyond CVS
or SVN?  I kind of get the non-centralized aspect of repositories and
clones, but I don't understand how changesets and tips work.  My
thoughts are that there is *one* source of the code (with histories)
regardless of SVN, Hg or whatever.  

  Hg's concept of multiple clones and repositories is quite interesting,
but doesn't feel right for the remote, non-connected group of developers
that wget gathers input from.  If we were all behind a firewall or could
share out each user's repository, it might make more sense, but I (for
one) wouldn't be able to share my repository (NAT'd, firewalled,
corporate desktop), so I just don't get it.  


Chris


Christopher G. Lewis
http://www.ChristopherLewis.com
 

> -Original Message-
> From: Micah Cowan [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, October 13, 2007 4:59 AM
> To: Wget
> Subject: Re: Version tracking in Wget binaries
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Micah Cowan wrote:
> > Hrvoje Niksic wrote:
> >> Micah Cowan <[EMAIL PROTECTED]> writes:
> > 
> >>> Among other things, version.c is now generated rather than
> >>> parsed. Every time "make all" is run, which also means that "make
> >>> all" will always relink the wget binary, even if there 
> haven't been
> >>> any changes.
> >> I personally find that quite annoying.  :-(  I hope there's a very
> >> good reason for introducing that particular behavior.
> > 
> > Well, making version.c a generated file is necessary to get the
> > most-recent revision for the working directory. I'd like to 
> avoid it,
> > obviously, but am not sure how without making version.c dependent on
> > every source file. But maybe that's the appropriate fix. It 
> shouldn't be
> > too difficult to arrange; probably just
> >   version.c:  $(wget_SOURCES)
> > or similar.
> 
> version.c is no longer unconditionally generated. The 
> "secondary" file,
> hg-id, which is generated to contain the revision id (and is used to
> avoid using GNU's $(shell ...) extension, which autoreconf complains
> about), depends on $(wget_SOURCES), and $(LDADD) (so that it properly
> includes conditionally-used sources such as http-ntlm.c or gen-md5.c
> when applicable).
> 
> This has the advantage that every "make" does not result in 
> regenerating
> version.c, recompiling version.c and relinking wget. It has the
> potential disadvantage that, since $(wget_SOURCES) includes version.c
> itself, there is the circular dependency: version.c -> hg-id ->
> version.c. GNU Make is smart enough to catch that and throw that
> dependency out.
> 
> - --
> Micah J. Cowan
> Programmer, musician, typesetting enthusiast, gamer...
> http://micah.cowan.name/
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFHEJb07M8hyUobTrERCE4rAJ9gKXonGN9bRydErVkxtZF8g723CACeLbhD
> VYUyd0MnjBdjcRXMSTge0ZE=
> =cC2V
> -END PGP SIGNATURE-
> 


Re: wget default behavior

2007-10-14 Thread Tony Godshall
On 10/14/07, Hrvoje Niksic <[EMAIL PROTECTED]> wrote:
> "Tony Godshall" <[EMAIL PROTECTED]> writes:
>
> > OK, so let's go back to basics for a moment.
> >
> > wget's default behavior is to use all available bandwidth.
>
> And so is the default behavior of curl, Firefox, Opera, and so on.
> The expected behavior of a program that receives data over a TCP
> stream is to consume data as fast as it arrives.

Yup.


Re: wget default behavior [was Re: working on patch to limit to "percent of bandwidth"]

2007-10-14 Thread Tony Godshall
On 10/13/07, Josh Williams <[EMAIL PROTECTED]> wrote:
> On 10/13/07, Tony Godshall <[EMAIL PROTECTED]> wrote:
> > Well, you may have such problems but you are very much reaching in
> > thinking that my --linux-percent has anything to do with any failing
> > in linux.
> >
> > It's about dealing with unfair upstream switches, which, I'm quite
> > sure, were not running Linux.
> >
> > Let's not hijack this into a linux-bash.
>
> I really don't know what you were trying to say here...

You seemed to think --limit-percent was a solution for a misbehavior of linux.

My experience with linux networking is that it's very effective and
that upstream non-linux switches don't handle such an effective client
well.

When a linux box is my gateway/firewall I don't experience
single-client monopolization at all.

As to your linux issues, that's a topic that should probably discussed
in another forum, but I will say that I'm quite happy with the latest
Linux kernels- with the low-latency patch integrated and enabled my
desktop experience is quite snappy, even on this four-year-old 1.2GHz
laptop.  And stay away from the distro "server" kernels- they are
optimized for throughput at the cost of latency- they do their I/O in
bigger chunks.  And stay away from the RT kernels- they go too far in
giving I/O priority over everything else and end up churning on IRQs
unless they are very carefully tuned.

And no, I won't call the linux kernel GNU/Linux, if that was what you
were after.  The kernel is after all the one Linux thing in a
GNU/Linux system.

> .. I use GNU/Linux.

Anyone try Debian GNU/BSD yet?  Or Debian/Nexenta/GNU/Solaris?

-- 
Best Regards.
Please keep in touch.


Re: css @import parsing

2007-10-14 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Andreas Pettersson wrote:
> Andreas Pettersson wrote:
>> Have there been any progress with this patch since this post?
>> http://www.mail-archive.com/wget@sunsite.dk/msg09502.html
> *bump*
> 
> Anyone knows the status of this?

Not yet installed... don't know what else to tell you, except that it's
slated to be included in Wget 1.12. Wget 1.11 is expected to be released
quite soon (just waiting for resolution of some licensing stuff), and
I'm afraid to say that CSS support won't be in in time for that.

However, I too am very interested to see CSS support included in Wget;
it'll be in when we have time to look at it more closely, and is one of
my higher priorities for Wget 1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHEldV7M8hyUobTrERCFsPAJ449TvEoo6IZVs5PP+fivSo4Hh6twCdEyjc
B8GWbP8CyVgV7GaY1n6qEx8=
=vYhq
-END PGP SIGNATURE-


Re: Myriad merges

2007-10-14 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jochen Roderburg wrote:
> Zitat von Micah Cowan <[EMAIL PROTECTED]>:

>> It's hard to be confident I'm not introducing more issues, with the
>> state of http.c being what it is. So please beat on it! :)
> 
> This time it survived the beating  ;-)

Yay!! :D

>> One issue I'm still aware of is that, if -c and -e
>> contentdisposition=yes are specified for a file already fully
>> downloaded, HEAD will be sent for the contentdisposition, and yet a GET
>> will still be sent to fetch the remainder of the -c (resulting in a 416
>> Requested Range Not Satisfiable). Ideally, Wget should be smart enough
>> to see from the HEAD that the Content-Length already matches the file's
>> size, even though -c no longer requires a HEAD (again). We _got_ one, we
>> should put it to good use.
>>
>> However, I'm not worried about addressing this before 1.11 releases;
>> it's a minor complaint, and with content-disposition's current
>> implementation, users are already going to be expecting an extra HEAD
>> round-trip in the general case; what's a few extra?
> 
> Agreed. I can confirm this behaviour, too. And I would also consider this a
> minor issue, at least the result is correct.
> 
> I have also not made many tests where content-disposition is really used for 
> the
> filename. Those few "real-live" cases that I have at hand do not send any
> special headers like timestamnps and filelengths with it. At least the local
> filename is set correctly and is correctly renamed if it exists.

And I expect there are probably several bugs lurking here (which is why
I've designated it as "experimental"). After the 1.11 release I want to
revisit that section, and look more closely at what happens if we get a
Content-Disposition at the last minute, especially if it specifies a
local file name that we are rejecting. I'd prefer that it not use HEAD
at all for that, as I expect Content-Disposition is rare enough that it
doesn't justify issuing HEAD just to see if its present; and in any case
it probably frequently isn't sent with HEAD responses, but only for GET.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHElVE7M8hyUobTrERCOG5AJ9xsAPlFyhXXC28E5TeqnoKXWuLPACbBAFN
SfRAf4ZfMFwvYXDKlcDV3dA=
=ZHVD
-END PGP SIGNATURE-


Re: css @import parsing

2007-10-14 Thread Andreas Pettersson

Andreas Pettersson wrote:

Have there been any progress with this patch since this post?
http://www.mail-archive.com/wget@sunsite.dk/msg09502.html

*bump*

Anyone knows the status of this?

--
Andreas




Re: Myriad merges

2007-10-14 Thread Jochen Roderburg
Zitat von Micah Cowan <[EMAIL PROTECTED]>:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Micah Cowan wrote:
> > Jochen Roderburg wrote:
> >> Unfortunately, however, a new regression crept in:
> >> In the case timestamping=on, content-disposition=off, no local file
> present it
> >> does now no HEAD (correctly), but two (!!) GETS and transfers the file two
> >> times.
> >
> > Ha! Okay, gotta get that one fixed...
>
> That should now be fixed.
>
> It's hard to be confident I'm not introducing more issues, with the
> state of http.c being what it is. So please beat on it! :)

This time it survived the beating  ;-)
Seems that we are finally converging. The double GET is gone, and my other test
cases still work as expected, including the -c variants.

> One issue I'm still aware of is that, if -c and -e
> contentdisposition=yes are specified for a file already fully
> downloaded, HEAD will be sent for the contentdisposition, and yet a GET
> will still be sent to fetch the remainder of the -c (resulting in a 416
> Requested Range Not Satisfiable). Ideally, Wget should be smart enough
> to see from the HEAD that the Content-Length already matches the file's
> size, even though -c no longer requires a HEAD (again). We _got_ one, we
> should put it to good use.
>
> However, I'm not worried about addressing this before 1.11 releases;
> it's a minor complaint, and with content-disposition's current
> implementation, users are already going to be expecting an extra HEAD
> round-trip in the general case; what's a few extra?

Agreed. I can confirm this behaviour, too. And I would also consider this a
minor issue, at least the result is correct.

I have also not made many tests where content-disposition is really used for the
filename. Those few "real-live" cases that I have at hand do not send any
special headers like timestamnps and filelengths with it. At least the local
filename is set correctly and is correctly renamed if it exists.

Best regards and thanks again for the repair of all the issues that I found,

Jochen Roderburg



Re: wget default behavior

2007-10-14 Thread Hrvoje Niksic
"Tony Godshall" <[EMAIL PROTECTED]> writes:

> OK, so let's go back to basics for a moment.
>
> wget's default behavior is to use all available bandwidth.

And so is the default behavior of curl, Firefox, Opera, and so on.
The expected behavior of a program that receives data over a TCP
stream is to consume data as fast as it arrives.