Jameson Scanlon wrote: > I should state some of the following items of info in response to the > email correspondence received : > > 1) Windows version information (I am not providing the full 'winver' > response obtained, because its probably not necessary – all that I > imagine that you'd need to know is the approximate windows OS upon > which I am attempting to download the relevant information). > > Microsoft (R) Windows > Version 5.1 (Build 2600.xpsp_sp3_gdr.080814-1236 : Service Pack 3) > Copyright (C) 2007 Microsoft Corporation
The information that might actually be relevant is whether the disk you're trying to download the dump to is using FAT or NTFS. FAT32 only supports files up to 4 GiB, while NTFS should be able to handle larger files. > I should have stated in my original statement that sometimes it is > possible for me to download more than 4GB, but that (for some reason > or other) the download cuts out (dunno why). Well, if so, that does kind of suggest that it's not the file system that's the problem. > 3) As a separate point, it occurs to me that one of the reasons for > why the download might cut out is that there are a sequence of servers > (according to tracert) upon which I rely for the download to proceed. > I could be wrong, but all it may take is one server (for whatever > reason) deciding that the download is problematic for the whole file > download to fail. The servers listed by tracert are only passing IP data packets between your computer and Wikimedia's server. They don't know or care if you're downloading one big file or several small ones, so they shouldn't make any difference. However, if your browser is configured to use a proxy, and the proxy can't handle large files properly, that could indeed be a problem. > It also seems like a good idea to split large files up using a file > splitter (whichever one takes your fancy) as larger file downloads > would seem to be problematic for most people who have access to > networks with only a limited connection speed. > > It occurs to me that, given the randomness of this problem, this > response might also be correspondingly random. Still, how long might > it take to organise something in the way of a (perhaps unix script > automated?) file splitting for the larger wikipedia database download > files? No, it wouldn't be difficult to do at all; the major issue, I'd assume, is that we'd have to store all the data twice if we wanted to provide both single file and split versions of the dumps. (Technically, it should be possible to write a PHP script or something to deliver individual chunks from a single large file, but that'd have its own complications.) Anyway, if the problem is that the download gets interrupted half way through, what you really want to do is use a download client (such as wget -c) that knows how to resume interrupted downloads from where they left off. Latest versions of Firefox apparently do have some limited support for that, but I'm not sure if there's any way to get Firefox to resume a download once it's decided it's failed. > PS – If it were ever the case that bit torrent were used for the > dissemination of large files (there has been some mention of this on > the wikipedia database download talk page), I can still imagine that > there might be problems with trying to propagate the WHOLE of such a > large file (~14GB) – though this assertion might run contrary to other > peoples experiences. Given that people routine use BitTorrent to download several dozen gigabyte movie files, I don't think it should have any problem with a mere 14 GiB database dump. > Anyhow, it occurs to me that, for the interests > of redundancy, it would be worthwhile to figure out whether there's a > way of changing the structure of the wikipedia database download so > that, even if only the first 1GB of the database were downloaded, it > would still be possible to read the information on it (perhaps this is > already the case – but, from what I gather, once an incomplete > database dump is downloaded – it is pretty useless, unless someone can > correct me). Actually, a truncated database dump should be perfectly usable, it just won't have all the data on it. Indeed, for some purposes, even a piece from the middle of the dump file can be used to extract useful data, although many standard tools won't be able to decompress and parse it. -- Ilmari Karonen _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
