subject:".1, .2 before suffix rather than after"

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams

On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> Yeah... of course they won't be able to edit the wiki that way.

I doubt you'd get the slashdot effect from just the people who're
interested in editing the wiki. You may get a handful of developers
and a few thousand people who only want to read it :-)

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Josh Williams wrote:
> On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
>> Well, the trouble with that is that I'm running all of Wget's stuff
>> (plus my own personal mail and whatnot) on a little VPS. I'm rather
>> concerned that the traffic will kill me. I'm already worried about it
>> potentially hitting SlashDot or Digg because it's the first Wget release
>> in quite a while. D:
> 
> Tada! http://en.wikipedia.org/wiki/Coral_Content_Distribution_Network
> 
> There's also archive.org.

Yeah... of course they won't be able to edit the wiki that way.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHT24z7M8hyUobTrERAprHAJ4gCaeiel8UPINXAa2wiept/ZsvFwCeLy0f
7SLzgXI6Jzcgmyy6GpyMH7k=
=MZaQ
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams

On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> Well, the trouble with that is that I'm running all of Wget's stuff
> (plus my own personal mail and whatnot) on a little VPS. I'm rather
> concerned that the traffic will kill me. I'm already worried about it
> potentially hitting SlashDot or Digg because it's the first Wget release
> in quite a while. D:

Tada! http://en.wikipedia.org/wiki/Coral_Content_Distribution_Network

There's also archive.org.

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Josh Williams wrote:
> On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
>> I dunno, man, I think our current wget2 roadmap goals are already pretty
>> wild-and-crazy. ;)
> 
> I agree. I think we should create an announcement asking for
> developers to help and submit it to digg and slashdot. The new
> features may get some excitement going and start rumors. :-P
> 
> ^^ in all seriousness ^^

Well, the trouble with that is that I'm running all of Wget's stuff
(plus my own personal mail and whatnot) on a little VPS. I'm rather
concerned that the traffic will kill me. I'm already worried about it
potentially hitting SlashDot or Digg because it's the first Wget release
in quite a while. D:

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD4DBQFHT2vG7M8hyUobTrERAiOAAJd6Htrtd2i9oxjJoK5ww+DFafzkAJ4lSiJR
qtT8LHghRuxYlkcdznnlmQ==
=ddEY
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams

On 11/29/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> I dunno, man, I think our current wget2 roadmap goals are already pretty
> wild-and-crazy. ;)

I agree. I think we should create an announcement asking for
developers to help and submit it to digg and slashdot. The new
features may get some excitement going and start rumors. :-P

^^ in all seriousness ^^

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tony Godshall wrote:
> ...
>> At the release of Wget 1.11, it is my intention to try to attract as
>> much developer interest as possible. At the moment, and despite Wget's
>> pervasive presence, it has virtually no user or developer community.
>> Given the amount of work that needs to be done, this is not good. The
>> announcement of the first new release of GNU Wget in two years seems a
>> great opportunity to solicit help!
> ...
> 
> That's sort of the nature of older tools with a well-defined mission-
> they do their
> job so well there's little itch to tweak them.  If it ain't broken,
> you don't fix it.
> Freshmeat lists wget as "mature", which basically means the same thing.

Yeah, I imagine that's it. Except that Wget _is_ broken in several
important ways... but I think it works for the vast majority of users.
In particular, I think the most widespread use of Wget is for fetching
single files, which Wget seldom has any problems doing. It's when you
try tricky things that Wget can sometimes break your expectations.

Even so, of course, I have rarely if ever run into problems using it,
personally.

> I guess wget will have to get a bit immature to get some buzz going.  Some
> pretty insane goals in a wget2 roadmap would probably do the trick.  How
> about announcing plans implement DHT and make bittorrent obsolete?  That
> should make slashdot ;-)

I dunno, man, I think our current wget2 roadmap goals are already pretty
wild-and-crazy. ;)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHT2ni7M8hyUobTrERAvh2AJ4hEcCzAF5vdpuflFJ1P7GyzPzjxgCfeaHh
/GVTxx+vFcm9PcE3a8P21qM=
=Hkhj
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Tony Godshall

...
> At the release of Wget 1.11, it is my intention to try to attract as
> much developer interest as possible. At the moment, and despite Wget's
> pervasive presence, it has virtually no user or developer community.
> Given the amount of work that needs to be done, this is not good. The
> announcement of the first new release of GNU Wget in two years seems a
> great opportunity to solicit help!
...

That's sort of the nature of older tools with a well-defined mission-
they do their
job so well there's little itch to tweak them.  If it ain't broken,
you don't fix it.
Freshmeat lists wget as "mature", which basically means the same thing.

I guess wget will have to get a bit immature to get some buzz going.  Some
pretty insane goals in a wget2 roadmap would probably do the trick.  How
about announcing plans implement DHT and make bittorrent obsolete?  That
should make slashdot ;-)

Tony

--
The above is not to be taken seriously.

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Josh Williams wrote:
> On Nov 29, 2007 6:20 PM, David Ginger
> <[EMAIL PROTECTED]> wrote:
>> So can I ask is a wget2 actualy being developed ?
> 
> Go ahead, but I'll answer that question before you do ;-)
> 
> The answer is no - not at the moment. But we've been discussing it for
> several months. It will be a while before any code is actually
> written.

Specifically, it will probably be years, unless we can get a much-needed
influx of developers in here. The list of issues targeted at Wget 1.12
are many, and most of them really should be resolved before we begin
work on the "beefier" Wget. And, as I am (1) by far the most active
current Wget developer, and (2) not all that terribly active, given that
it's all just in my spare time ;) - work is liable to be a bit slow.

The good news is, once the Wget 1.12 stuff is out of the way, we can
move almost all focus to the new thing, as Wget will be almost
completely in bug-fixes-only mode. Given that's the case, one might
argue that Wget 2.0 is in fact a reasonable name for the new package.
I'm still thinking about that stuff, and will probably add a Wiki page
for the purpose of names discussion soon.

At the release of Wget 1.11, it is my intention to try to attract as
much developer interest as possible. At the moment, and despite Wget's
pervasive presence, it has virtually no user or developer community.
Given the amount of work that needs to be done, this is not good. The
announcement of the first new release of GNU Wget in two years seems a
great opportunity to solicit help!

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHT1FT7M8hyUobTrERAswMAJ9rNSv2kC1MIy3vErblMfcqBmcWdQCgjT2z
C8kgh5b4msWnw0ORb8x0Jl8=
=VMV+
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Josh Williams

On Nov 29, 2007 6:20 PM, David Ginger
<[EMAIL PROTECTED]> wrote:
> So can I ask is a wget2 actualy being developed ?

Go ahead, but I'll answer that question before you do ;-)

The answer is no - not at the moment. But we've been discussing it for
several months. It will be a while before any code is actually
written.

Re: .1, .2 before suffix rather than after

2007-11-29 Thread David Ginger

> i totally agree with hrvoje here. also note that changing wget
> unique-name-finding algorithm can potentially break lots of wget-based
> scripts out there. i think we should leave these kind of changes for wget2
> - or wget-on-steroids or however you want to call it ;-)

So can I ask is a wget2 actualy being developed ?

Re: .1, .2 before suffix rather than after

2007-11-29 Thread Mauro Tortonesi

On Sunday 04 November 2007 22:54:24 Hrvoje Niksic wrote:
> Micah Cowan <[EMAIL PROTECTED]> writes:
> > Christian Roche has submitted a revised version of a patch to modify
> > the unique-name-finding algorithm to generate names in the pattern
> > "foo-n.html" rather than "foo.html.n". The patch looks good, and
> > will likely go in very soon.
>
> foo.html.n has the advantage of simplicity: you can tell at a glance
> that .n is a duplicate of .  Also, it is trivial to remove
> the unwanted files by removing .*.  Why change what worked so
> well in the past?

i totally agree with hrvoje here. also note that changing wget 
unique-name-finding algorithm can potentially break lots of wget-based 
scripts out there. i think we should leave these kind of changes for wget2 - 
or wget-on-steroids or however you want to call it ;-)

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi                          http://www.tortonesi.com

University of Ferrara - Dept. of Eng.    http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linux            http://www.deepspace6.net
Ferrara Linux User Group                 http://www.ferrara.linux.it

Re: .1, .2 before suffix rather than after

2007-11-16 Thread Hrvoje Niksic

"Tony Lewis" <[EMAIL PROTECTED]> writes:

> Hrvoje Niksic wrote:
>> > And how is .tar.gz renamed?  .tar-1.gz?
>> Ouch.
>
> OK. I'm responding to the chain and not Hrvoje's expression of pain. :-)
>
> What if we changed the semantics of --no-clobber so the user could specify
> the behavior? I'm thinking it could accept the following strings:
> - after: append a number after the file name (current behavior)
> - before: insert a number before the suffix

But see Andreas's post quoted above: the term "suffix" is ambiguous.
In foo.tar.gz, what is the suffix?  How about .emacs.el?  And
Heroes.S203.DivX.avi?

Currently implemented name mangling is far from perfect, but it's easy
to understand, to recognize, and to reverse.  One other possibility
that offers the same features would be to put the number before the
file, such as "1.foo.html" instead of "foo.html.1"; but that seems
hardly an improvement.

> - new: change name of new file (current behavior)
> - old: change name of old file

It would be nice to be able to change the name of the old file, but
when you start to consider the consequences, it gets trickier.  What
do you do when you have many files left over from previous runs, such
as foo, foo.1, foo.2, etc.?  Handling it correctly would trigger a
flurry of renames, which would need to be carried out in the correct
order, be prepared to handle a rename failing, and to detect changed
conditions in mid-run.  In general it seems like bad design to need to
touch many files in order to simply download one.  Maybe the improved
end user experience makes it worth it, but at this point I'm not
convinced of it.

> Back to the painful point at the start of this note, I think we
> treat ".tar.gz" as a suffix and if --no-clobber=before is specified,
> the file name becomes ".1.tar.gz".

But see my other examples above.

RE: .1, .2 before suffix rather than after

2007-11-16 Thread Tony Lewis

Hrvoje Niksic wrote:

> > And how is .tar.gz renamed?  .tar-1.gz?
>
> Ouch.

OK. I'm responding to the chain and not Hrvoje's expression of pain. :-)

What if we changed the semantics of --no-clobber so the user could specify
the behavior? I'm thinking it could accept the following strings:
- after: append a number after the file name (current behavior)
- before: insert a number before the suffix
- new: change name of new file (current behavior)
- old: change name of old file

With this scheme --no-clobber becomes equivalent to --no-clobber=after,new.
If I want to change where the number appears in the file name or have the
old file renamed then I can specify the behavior I want on the command line
(or in .wgetrc). I think I would change my default to
--no-clobber=before,old.

I think it would be useful to have semantics in .wgetrc where I specify what
I want my --no-clobber default to be without that meaning I want
--no-clobber processing on each invocation. It would be nice if I could say
that I want my default to be "before,old", but to only have that apply when
I specify --no-clobber on the command line.

Back to the painful point at the start of this note, I think we treat
".tar.gz" as a suffix and if --no-clobber=before is specified, the file name
becomes ".1.tar.gz".

Tony

Re: .1, .2 before suffix rather than after

2007-11-06 Thread Hrvoje Niksic

Andreas Pettersson <[EMAIL PROTECTED]> writes:

> And how is .tar.gz renamed?  .tar-1.gz?

Ouch.

Re: .1, .2 before suffix rather than after

2007-11-06 Thread Andreas Pettersson


Hrvoje Niksic wrote:

It just occurred to me that this change breaks backward compatibility.
It will break scripts that try to clean up after Wget or that in any
way depend on the current naming scheme


I'm also a bit hesitant about changing the way files get named.

With a .1 at the absolute end of the filename I _know_ this file got its 
name because there already was a file with the same name. If the new 
file instead is named filename-1.jpg I cannot be certain if this is 
because of a file collision, or if the original file really had this 
name, which of course it might have had.


If a script is supposed to restore the original filename of a downloaded 
file (perhaps for future downloads), it's easy to just cut the trailing 
number, it there is one. How could that be done in an easy and secure 
way if there is an eventual number before the extension, a number that I 
don't even know if it's part of the original filename or not?


And already having local files named -1.ext is not so uncommon. What 
happens if there is a local file with that name? -2.ext could be the 
answer, but that makes it really difficult to find downloaded files 
programmatically.


And how is .tar.gz renamed?  .tar-1.gz?

Sorry, but I'm not so sure about this..

--
Andreas

Re: .1, .2 before suffix rather than after

2007-11-06 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Christopher G. Lewis wrote:
> Hmm - changing the rename schema would potentially create a HUGE issue with
> clobbering.
> 
> For example, and quite hypothetical...
> 
> Given a directory with the following:
>   index.html
>   index-1.html
>   index.1.html
> 
> All three are served by the server and rendered by the browser.  They are
> distinct files given the file system and the URL interpretation of the file
> system by the web server.
> 
> Now, Wget downloads index.html, then downloads it again.  Our choices for
> the second file are:
>   1) index.html.1
>   2) index-1.html
>   3) index.1.html
> 
> Of the three, only #1 is pretty much guaranteed *not* to exist on the web
> server.  Why?  Because by changing the extension, we've changed the content
> type.  So if our intentions are to not clobber (which, I believe, is the
> whole point) we are *much* better off sticking with the current schema and
> creating a file that most can't be served by the web server.

Of course you are 100% correct that it is the whole point.

However, while this is indeed a problem, I don't think it's a clobbering
problem. I believe Wget would then choose (or could be made to then
choose) index-2.html, etc, for the file which on the server is named
index-1.html.

Of course, while that would resolve clobbering, that would make it
virtually impossible to determine what file had what local name, which
is entirely unacceptable.

I wonder how Wget currently handles perverse cases like index.html.1
actually existing on the server and already on the local system. :)

> Note that this is quite a contrived example to illustrate the point.

Yeah. Unfortunately, though, something like page-1.html, page-2.html,
isn't quite so unlikely.

It's intended that Reget (I'll call it that for now, until we figure out
what the hell we're going to do with that whole cluster of
functionality) will have support for a database of download-session
metadata, that would handle mappings between the remote URI and the
local file. With that, it'd be possible to construct a simple utility
which could be invoked like, "reget-fmap http://example.com/foo.html";
and might spit out something like "./example.com/foo.html".

This might couple quite well with providing a plugin hook to control the
renaming scheme.

Given your excellent points, and the fact that I didn't get the
overwhelmingly positive response to this suggestion that I had
anticipated, I'd better table this patch. :(

> However, my 2 cents on the behavior - It would be *wonderful* if wget could
> look at the local file system and rename each version to file.ext.n+1 so the
> new download is index.html, not index.html.1.  I've been caught a couple of
> times with this, so to me the default behavior is backwards (ie, new file
> s/b the URL, older files get versioned)

That would of course be substantially more work, and provide even
greater opportunity for race conditions/interoperability issues than we
already do, but I agree that it'd be nice-to-have.

Unfortunately, I don't think there's any way we'll ever do this in Wget:
 it'd be too confusing for people used to the current way. And while, as
Hrvoje pointed out, the currently proposed suffixes patch could
potentially break backwards compatibility, it's not likely to do so in a
harmful/destructive way, whereas any current scripts that currently
download files and then erase the renamed ones will suddenly be
destroying the new data, rather than the old, if we reverse the renaming. :\

That problem is partly exacerbated by the fact that, from a certain
perspective, we ought to be able to "stick our noses in the air" and
claim that any scripts of that sort ought to have been telling Wget to
clobber files, rather than letting Wget rename them and trying to delete
them afterwards... but there is not currently a way to Wget to do that.
There is no way to ask Wget to clobber files when it normally wouldn't.

However, with the proper hook in Reget, it'd be easy enough to have a
plugin that handles it this way. Actually, since Reget is looking to
probably be an entirely new beast, and we'll certainly have to break
compatibility with traditional Wget, we could consider making this the
default renaming mechanism for Reget; but I'm still concerned about the
extra work, race conditions, and potential for screwing with other
programs that may be operating on some of the files involved.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHMKRh7M8hyUobTrERCFeGAJwM8yPR35j8rbsqkG8Vk8A1Bdm0YACggbBN
6s7EOEwhxCerjaeuQAblccw=
=rdpM
-END PGP SIGNATURE-

RE: .1, .2 before suffix rather than after

2007-11-06 Thread Christopher G. Lewis

Hmm - changing the rename schema would potentially create a HUGE issue with
clobbering.

For example, and quite hypothetical...

Given a directory with the following:
  index.html
  index-1.html
  index.1.html

All three are served by the server and rendered by the browser.  They are
distinct files given the file system and the URL interpretation of the file
system by the web server.

Now, Wget downloads index.html, then downloads it again.  Our choices for
the second file are:
  1) index.html.1
  2) index-1.html
  3) index.1.html

Of the three, only #1 is pretty much guaranteed *not* to exist on the web
server.  Why?  Because by changing the extension, we've changed the content
type.  So if our intentions are to not clobber (which, I believe, is the
whole point) we are *much* better off sticking with the current schema and
creating a file that most can't be served by the web server.

Note that this is quite a contrived example to illustrate the point.

However, my 2 cents on the behavior - It would be *wonderful* if wget could
look at the local file system and rename each version to file.ext.n+1 so the
new download is index.html, not index.html.1.  I've been caught a couple of
times with this, so to me the default behavior is backwards (ie, new file
s/b the URL, older files get versioned)

Chris

Christopher G. Lewis
http://www.ChristopherLewis.com

> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, November 04, 2007 4:19 PM
> To: Wget
> Cc: Christian Roche
> Subject: Re: .1, .2 before suffix rather than after
> 
> Hrvoje Niksic <[EMAIL PROTECTED]> writes:
> 
> > Micah Cowan <[EMAIL PROTECTED]> writes:
> >
> >> Christian Roche has submitted a revised version of a patch 
> to modify
> >> the unique-name-finding algorithm to generate names in the pattern
> >> "foo-n.html" rather than "foo.html.n". The patch looks good, and
> >> will likely go in very soon.
> >
> > foo.html.n has the advantage of simplicity: you can tell at a glance
> > that .n is a duplicate of .  Also, it is trivial to remove
> > the unwanted files by removing .*.
> 
> It just occurred to me that this change breaks backward compatibility.
> It will break scripts that try to clean up after Wget or that in any
> way depend on the current naming scheme.
> 

smime.p7s
Description: S/MIME cryptographic signature

Re: .1, .2 before suffix rather than after

2007-11-05 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
> Micah Cowan <[EMAIL PROTECTED]> writes:
> 
>>> It just occurred to me that this change breaks backward compatibility.
>>> It will break scripts that try to clean up after Wget or that in any
>>> way depend on the current naming scheme.
>> It may. I am not going to commit to never ever changing the current
>> naming scheme.
> 
> Agreed, but there should be a very good reason for changing it, and
> the change should be a clear improvement.

How do those reasons differ? :)

> In my view, neither is the
> case here.

It seems like a fairly clear improvement to me; at least, I believe that
the improvement would outweigh the rather mild risk that it might break
something. It's a mild improvement, but it's an even milder risk, AFAICT.

> For example, the change to respect the Content-Disposition
> header constitutes a good reason[1].

(I don't seem to have the footnote you seem to have intended to put there.)

I'm not sure how good an example Content-Disposition is, though, given
that the risk of backwards-incompatibility is probably virtually nil. In
that this is a more general change, whereas that is a specific change
(to a certain subset of URLs).

Of course, your opinion is important to me, and to be honest, I didn't
expect to find any resistance to this idea (there were no comments
besides mine on the original post back in July). So I welcome further
feedback. However, at the moment, I don't see any compelling reason not
to apply the change, and do find reason to apply it (interoperability
seems like a desirable trait).

Hm... wget-patches seems not to be archived... There's a supposed link
to a gmane archive, but it's apparently empty. :\ That makes it
difficult to refer to the original post from July 13.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLuyC7M8hyUobTrERCJ1lAJ99AfiCPkjPra9UlBakgyKlUMhyFQCfY0ht
57y31BM4+6YFadFnhkVH62Q=
=UQ5g
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-05 Thread Hrvoje Niksic

Micah Cowan <[EMAIL PROTECTED]> writes:

>> It just occurred to me that this change breaks backward compatibility.
>> It will break scripts that try to clean up after Wget or that in any
>> way depend on the current naming scheme.
>
> It may. I am not going to commit to never ever changing the current
> naming scheme.

Agreed, but there should be a very good reason for changing it, and
the change should be a clear improvement.  In my view, neither is the
case here.  For example, the change to respect the Content-Disposition
header constitutes a good reason[1].

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Steven M. Schweda

   I don't care particularly how this stuff works, but if you'd like to
do me a favor, please make sure, whatever the final scheme is, that it's
easy to add the #ifdef for VMS to bypass the whole mess, because the
file version numbers on VMS obviate it.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
> Hrvoje Niksic <[EMAIL PROTECTED]> writes:
> 
>> Micah Cowan <[EMAIL PROTECTED]> writes:
>>
>>> Christian Roche has submitted a revised version of a patch to modify
>>> the unique-name-finding algorithm to generate names in the pattern
>>> "foo-n.html" rather than "foo.html.n". The patch looks good, and
>>> will likely go in very soon.
>> foo.html.n has the advantage of simplicity: you can tell at a glance
>> that .n is a duplicate of .  Also, it is trivial to remove
>> the unwanted files by removing .*.
> 
> It just occurred to me that this change breaks backward compatibility.
> It will break scripts that try to clean up after Wget or that in any
> way depend on the current naming scheme.

It may. I am not going to commit to never ever changing the current
naming scheme. It is the responsibility of the upgrader to read the NEWS
file, after all.

Obviously I don't want to wantonly break backward compatibility, but
this seems like a worthwhile change, and I can't imagine there being a
particularly high number of such scripts.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLlyk7M8hyUobTrERCD/XAJ9YQEoqdz4pFJi3OQlocjBFPz4ADwCfUu4D
w+tkP1DrkvZxnosFcpV2jH4=
=flxY
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Josh Williams

On 11/4/07, Hrvoje Niksic <[EMAIL PROTECTED]> wrote:
> It just occurred to me that this change breaks backward compatibility.
> It will break scripts that try to clean up after Wget or that in any
> way depend on the current naming scheme.
>

You mean the scripts that fix the same problem this patch does? ;-)

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Hrvoje Niksic

Hrvoje Niksic <[EMAIL PROTECTED]> writes:

> Micah Cowan <[EMAIL PROTECTED]> writes:
>
>> Christian Roche has submitted a revised version of a patch to modify
>> the unique-name-finding algorithm to generate names in the pattern
>> "foo-n.html" rather than "foo.html.n". The patch looks good, and
>> will likely go in very soon.
>
> foo.html.n has the advantage of simplicity: you can tell at a glance
> that .n is a duplicate of .  Also, it is trivial to remove
> the unwanted files by removing .*.

It just occurred to me that this change breaks backward compatibility.
It will break scripts that try to clean up after Wget or that in any
way depend on the current naming scheme.

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
> Micah Cowan <[EMAIL PROTECTED]> writes:
> 
>> Christian Roche has submitted a revised version of a patch to modify
>> the unique-name-finding algorithm to generate names in the pattern
>> "foo-n.html" rather than "foo.html.n". The patch looks good, and
>> will likely go in very soon.
> 
> foo.html.n has the advantage of simplicity: you can tell at a glance
> that .n is a duplicate of .  Also, it is trivial to remove
> the unwanted files by removing .*.  Why change what worked so
> well in the past?

Well, the original motivation for Chris was that it was actually
interfering with the accept/reject rules; see the log.txt attachment at
https://savannah.gnu.org/bugs/index.php?20482; this behavior is also
related to the -nd/-r behavior I brought up yesterday.

However, that's obviously not a good long-term fix for the problem; the
real reason _I_ like it, is that it preserves the type of the files, on
systems/applications that depend on the filename extension to identify
it. Most browsers I've seen, including Lynx (though for Lynx you can
specify a flag to override it, I think) depend on this, at least for
HTML; and even for JPEgs and such on Unixen it is often beneficial to
have an extension that matches the type. It automatically gives an
"-E"-like benefit (for this instance; not for URLs that don't end with
appropriate extensions).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLkQ47M8hyUobTrERCKpvAJkBzlvl9td1pRmzfZqJmRM9M8LtJQCcCHl6
yDVeZRljJ2QSISmTxVQ/oLI=
=Z+7T
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Hrvoje Niksic

Micah Cowan <[EMAIL PROTECTED]> writes:

> Christian Roche has submitted a revised version of a patch to modify
> the unique-name-finding algorithm to generate names in the pattern
> "foo-n.html" rather than "foo.html.n". The patch looks good, and
> will likely go in very soon.

foo.html.n has the advantage of simplicity: you can tell at a glance
that .n is a duplicate of .  Also, it is trivial to remove
the unwanted files by removing .*.  Why change what worked so
well in the past?

> A couple of minor detail questions: what do you guys think about using
> "foo.n.html" instead of "foo-n.html"?

Better, but IMHO not as good as foo.html.n.  But I'm obviously biased.
:-)

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
> On 11/4/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
>> Christian Roche has submitted a revised version of a patch to modify the
>> unique-name-finding algorithm to generate names in the pattern
>> "foo-n.html" rather than "foo.html.n". The patch looks good, and will
>> likely go in very soon.
> 
> That's something I had meant to submit a bug report for a while back,
> but somehow never found the time to do it. I guess it wasn't my top
> priority since GNU/Linux is usually smart enough to ignore the file
> extensions anyways.

I have not found that to be generally true; and particularly in the case
of HTML files, which is most relevant here.

>> A couple of minor detail questions: what do you guys think about using
>> "foo.n.html" instead of "foo-n.html"? And (this one to Gisle), how would
>> this naming convention affect DOS (and, BTW, how does the current one
>> hold up on DOS)?
> 
> Well, this problem is  mainly for win32 users, so I think we need to
> keep sloppy coding in mind. It's been my experience that *man* win32
> programs will treat everything after the first period as the file
> extension.
> 
> Honestly, I don't see any reason to risk the annoyance of these kinds
> of bugs. Just go with the dash.

Yeah, and that was probably the reason for it.

> (On a side note, have you thought of running FreeDOS in a virtual machine?)

I have, but haven't gotten around to it, and probably won't for a while.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLizQ7M8hyUobTrERCACFAJ4oJ/y+EGLiRyCj+qLaxbAEFWkSSwCfc5pQ
dS3sv26PHop1Hfz73FcpFRg=
=lVrq
-END PGP SIGNATURE-

Re: .1, .2 before suffix rather than after

2007-11-04 Thread Josh Williams

On 11/4/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> Christian Roche has submitted a revised version of a patch to modify the
> unique-name-finding algorithm to generate names in the pattern
> "foo-n.html" rather than "foo.html.n". The patch looks good, and will
> likely go in very soon.

That's something I had meant to submit a bug report for a while back,
but somehow never found the time to do it. I guess it wasn't my top
priority since GNU/Linux is usually smart enough to ignore the file
extensions anyways.

> A couple of minor detail questions: what do you guys think about using
> "foo.n.html" instead of "foo-n.html"? And (this one to Gisle), how would
> this naming convention affect DOS (and, BTW, how does the current one
> hold up on DOS)?

Well, this problem is  mainly for win32 users, so I think we need to
keep sloppy coding in mind. It's been my experience that *man* win32
programs will treat everything after the first period as the file
extension.

Honestly, I don't see any reason to risk the annoyance of these kinds
of bugs. Just go with the dash.

(On a side note, have you thought of running FreeDOS in a virtual machine?)

.1, .2 before suffix rather than after

2007-11-04 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Christian Roche has submitted a revised version of a patch to modify the
unique-name-finding algorithm to generate names in the pattern
"foo-n.html" rather than "foo.html.n". The patch looks good, and will
likely go in very soon.

A couple of minor detail questions: what do you guys think about using
"foo.n.html" instead of "foo-n.html"? And (this one to Gisle), how would
this naming convention affect DOS (and, BTW, how does the current one
hold up on DOS)?

If I don't get an answer soon, I'll probably just go ahead and apply the
patch, and plan to make any necessary adjustments later. I suspect that
if DOS, Windows, or other systems need special treatment, they'll need
to use their own version of unique_name_1 anyway.

I've attached the patch for reference. The only beefs I currently have
with it is that we should prefer strrchr() to a for-loop; and I'd prefer
more robust handling of the alloca'd buffer size (but these are easily
fixed).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLhQx7M8hyUobTrERCEUoAJ9dO7OK6X8B4YraDTptgmjMrEYnTgCgirvE
JVFv+RUdcwONlOf2/OKaAPM=
=8nRY
-END PGP SIGNATURE-
diff -r ca1ba64545bc doc/ChangeLog
--- a/doc/ChangeLog Tue Oct 23 12:34:10 2007 -0700
+++ b/doc/ChangeLog Sat Nov 03 12:49:25 2007 +
@@ -1,3 +1,8 @@ 2007-10-13  Micah Cowan  <[EMAIL PROTECTED]
+2007-10-29  Christian Roche <[EMAIL PROTECTED]>
+
+   * wget.texi:
+   Updated description of file renaming scheme.
+
 2007-10-13  Micah Cowan  <[EMAIL PROTECTED]>
 
* wget.texi : Replaced mention of no-longer
diff -r ca1ba64545bc doc/wget.texi
--- a/doc/wget.texi Tue Oct 23 12:34:10 2007 -0700
+++ b/doc/wget.texi Sat Nov 03 12:49:25 2007 +
@@ -573,18 +573,18 @@ cases, the local file will be @dfn{clobb
 cases, the local file will be @dfn{clobbered}, or overwritten, upon
 repeated download.  In other cases it will be preserved.
 
-When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or @samp{p},
-downloading the same file in the same directory will result in the
-original copy of @var{file} being preserved and the second copy being
-named @[EMAIL PROTECTED]  If that file is downloaded yet again, the
-third copy will be named @[EMAIL PROTECTED], and so on.  When
[EMAIL PROTECTED] is specified, this behavior is suppressed, and Wget will
-refuse to download newer copies of @[EMAIL PROTECTED]  Therefore,
[EMAIL PROTECTED]'' is actually a misnomer in this mode---it's not
-clobbering that's prevented (as the numeric suffixes were already
-preventing clobbering), but rather the multiple version saving that's
+When running Wget without @samp{-N}, @samp{-nc}, or @samp{-r}, downloading the
+same file in the same directory will result in the original copy of @var{file}
+being preserved and the second copy being named
[EMAIL PROTECTED]@[EMAIL PROTECTED], assuming @var{file} = @var{prefix.suffix}.
+If that file is downloaded yet again, the third copy will be named
[EMAIL PROTECTED]@[EMAIL PROTECTED], and so on. When @samp{-nc} is specified,
+this behavior is suppressed, and Wget will refuse to download newer copies of
[EMAIL PROTECTED]@var{file}}. Therefore, [EMAIL PROTECTED]'' is actually a 
misnomer in
+this mode---it's not clobbering that's prevented (as the numeric suffixes were
+already preventing clobbering), but rather the multiple version saving that's
 prevented.
-
+  
 When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N}
 or @samp{-nc}, re-downloading a file will result in the new copy
 simply overwriting the old.  Adding @samp{-nc} will prevent this
@@ -1611,7 +1611,7 @@ details.
 @item -l @var{depth}
 @itemx [EMAIL PROTECTED]
 Specify recursion maximum depth level @var{depth} (@pxref{Recursive
-Download}).  The default maximum depth is 5.
+Download}).  The default maximum depth is 5.  Zero means infinite recursion.
 
 @cindex proxy filling
 @cindex delete after retrieval
diff -r ca1ba64545bc src/ChangeLog
--- a/src/ChangeLog Tue Oct 23 12:34:10 2007 -0700
+++ b/src/ChangeLog Sat Nov 03 12:52:17 2007 +
@@ -1,3 +1,13 @@ 2007-10-22  Gisle Vanem  <[EMAIL PROTECTED]
+2007-10-29  Christian Roche <[EMAIL PROTECTED]>
+
+   * utils.c (unique_name_1):
+   Modified filename generation scheme when avoiding clobbering to 
preserve file extensions.
+   
+   * recurc.c (download_child_p, point 6):
+   When checking whether a URL should be treated as HTML, use
+   link_expect_html flag instead of relying on the written file extension
+   by calling has_html_suffix_p.
+
 2007-10-22  Gisle Vanem  <[EMAIL PROTECTED]>
 
* mswindows.c: Move INHIBIT_WRAP macro definition up with wget.h
diff -r ca1ba64545bc src/recur.c
--- a/src/recur.c   Tue Oct 23 12:34:10 2007 -0700
+++ b/src/recur.c   Sat N

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

RE: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

RE: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

Re: .1, .2 before suffix rather than after

.1, .2 before suffix rather than after

28 matches

Site Navigation

Mail list logo

Footer information