Re: [Wikitech-l] Dealing with Large Files when attempting a wikipedia database download.

Domas Mituzas Tue, 14 Apr 2009 00:43:28 -0700

Hello,

> I could be wrong, but all it may take is one server (for whatever
> reason) deciding that the download is problematic for the whole file
> download to fail.


Our download servers support resume.

> 5)    Is there some type of timeout command lying somewhere which might
> instruct the wikipedia server to quit a particular attempt to download
> a large file if it is taking too long?

No.

>       It also seems like a good idea to split large files up using a file
> splitter (whichever one takes your fancy) as larger file downloads
> would seem to be problematic for most people who have access to
> networks with only a limited connection speed.

Our download servers support range requests, which are used by proper  
download clients to resume the downloads.
Every modern HTTP client should support download resume and large  
files - people are not running fat16 anymore either (you know, that  
doesn't support >2GB either), why would network tools and delivery be  
as ancient?

>       It occurs to me that, given the randomness of this problem, this
> response might also be correspondingly random.  Still, how long might
> it take to organise something in the way of a (perhaps unix script
> automated?) file splitting for the larger wikipedia database download
> files?

There is no need - we're using standards released 10 years ago to do  
the work properly.

> already the case – but, from what I gather, once an incomplete
> database dump is downloaded – it is pretty useless, unless someone can
> correct me).


Use HTTP resume functionality:

wget --continue
curl --continue-at

BR,
-- 
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Dealing with Large Files when attempting a wikipedia database download.

Reply via email to