http://en.wikipedia.org/wiki/Wikipedia_database has some information on how to deal with the large files
henna On Fri, Apr 10, 2009 at 21:43, Daniel Kinzler <[email protected]> wrote: > David Gerard schrieb: >> 2009/4/10 Jameson Scanlon <[email protected]>: >> >>> Does anyone on the wikitech mailing list happen to know whether it >>> would be possible for some of the larger wikipedia database downloads >>> (which are, say, 16GB or so in size) to be split into parts so that >>> they can be downloaded. For whatever reason, whenever I have >>> attempted to download the ~14GB files (say, from >>> http://static.wikipedia.org/downloads/2008-06/en/ ), I have found that >>> only 2GB (presumably, the first 2GB) of what I have sought to download >>> has actually been downloaded. Is there anyway around this? Could >>> anyone possibly suggest what possible reasons there might be for this >>> difficulty in downloading the material? >> >> >> Downloading to a filesystem that only does maximum 2GB files? >> > > Also, several http clients don't like files over 2GB - this is because the > large > number of bytes in the Length field causes an integer overflow (2GB is the 31 > bit limit). wget likes to die with a segmentation fault on those. I found that > curl works. > > But of course, the file system also has to support very large files, as > Gerard said. > > Finally: yes, it would be nive to have such dumps available in pieces of > perhaps > 1GB in size. > > -- daniel > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- "Maybe you knew early on that your track went from point A to B, but unlike you I wasn't given a map at birth!" Alyssa, "Chasing Amy" _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
