without touching the file system

On Thu, Jun 19, 2008 at 9:23 AM, mm w <[EMAIL PROTECTED]> wrote:
> a simple url-rewriting conf should fix the problem, wihout touch the file 
> system
> everything can be done server side
>
> Best Regards
>
> On Thu, Jun 19, 2008 at 6:29 AM, Coombe, Allan David (DPS)
> <[EMAIL PROTECTED]> wrote:
>> Thanks averyone for the contributions.
>>
>> Ultimately, our purpose is to process documents from the site into our
>> search database, so probably the most important thing is to limit the
>> number of files being processed.  The case of  the URLs in the html
>> probably wouldn't cause us much concern, but I could see that it might
>> be useful to "convert" a site for mirroring from a non-case sensetive
>> (windows) environment to a case sensetive (li|u)nix one - this would
>> need to include translation of urls in content as well as filenames on
>> disk.
>>
>> In the meantime - does anyone know of a proxy server that could
>> translate urls from mixed case to lower case.  I thought that if we
>> downloaded using wget via such a proxy server we might get the
>> appropriate result.
>>
>> The other alternative we were thinking of was to post process the files
>> with symlinks for all mixed case versions of files and directories (I
>> think someone already suggested this - greate minds and all that...). I
>> assume that wget would correctly use the symlink to determine the
>> time/date stamp of the file for determining if it requires updating (or
>> would it use the time/date stamp of the symlink?). I also assume that if
>> wget downloaded the file it would overwrite the symlink and we would
>> have to run our "convert files to" symlinks process again.
>>
>> Just to put it in perspective, the actual site is approximately 45gb
>> (that's what the administrator said) and wget downloaded > 100gb
>> (463,000 files) when I did the first process.
>>
>> Cheers
>> Allan
>>
>> -----Original Message-----
>> From: Micah Cowan [mailto:[EMAIL PROTECTED]
>> Sent: Saturday, 14 June 2008 7:30 AM
>> To: Tony Lewis
>> Cc: Coombe, Allan David (DPS); 'Wget'
>> Subject: Re: Wget 1.11.3 - case sensetivity and URLs
>>
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Tony Lewis wrote:
>>> Micah Cowan wrote:
>>>
>>>> Unfortunately, nothing really comes to mind. If you'd like, you could
>>
>>>> file a feature request at
>>>> https://savannah.gnu.org/bugs/?func=additem&group=wget, for an option
>>
>>>> asking Wget to treat URLs case-insensitively.
>>>
>>> To have the effect that Allan seeks, I think the option would have to
>>> convert all URIs to lower case at an appropriate point in the process.
>>
>>> I think you probably want to send the original case to the server
>>> (just in case it really does matter to the server). If you're going to
>>
>>> treat different case URIs as matching then the lower-case version will
>>
>>> have to be stored in the hash. The most important part (from the
>>> perspective that Allan voices) is that the versions written to disk
>>> use lower case characters.
>>
>> Well, that really depends. If it's doing a straight recursive download,
>> without preexisting local files, then all that's really necessary is to
>> do lookups/stores in the blacklist in a case-normalized manner.
>>
>> If preexisting files matter, then yes, your solution would fix it.
>> Another solution would be to scan directory contents for the first name
>> that matches case insensitively. That's obviously much less efficient,
>> but has the advantage that the file will match at least one of the
>> "real" cases from the server.
>>
>> As Matthias points out, your lower-case normalization solution could be
>> achieved in a more general manner with a hook. Which is something I was
>> planning on introducing perhaps in 1.13 anyway (so you could, say, run
>> sed on the filenames before Wget uses them), so that's probably the
>> approach I'd take. But probably not before 1.13, even if someone
>> provides a patch for it in time for 1.12 (too many other things to focus
>> on, and I'd like to introduce the "external command" hooks as a suite,
>> if possible).
>>
>> OTOH, case normalization in the blacklists would still be useful, in
>> addition to that mechanism. Could make another good addition for 1.13
>> (because it'll be more useful in combination with the rename hooks).
>>
>> - --
>> Micah J. Cowan
>> Programmer, musician, typesetting enthusiast, gamer,
>> and GNU Wget Project Maintainer.
>> http://micah.cowan.name/
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.6 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFIUua+7M8hyUobTrERAr0tAJ98A/WCfPNhTOQ3Xcfx2eWP2stofgCcDUUQ
>> nVYivipui+0TRmmK04kD2JE=
>> =OMsD
>> -----END PGP SIGNATURE-----
>>
>
>
>
> --
> -mmw
>



-- 
-mmw

Reply via email to