http://en.wikipedia.org/wiki/Wikipedia_database has some information
on how to deal with the large files

henna

On Fri, Apr 10, 2009 at 21:43, Daniel Kinzler <[email protected]> wrote:
> David Gerard schrieb:
>> 2009/4/10 Jameson Scanlon <[email protected]>:
>>
>>> Does anyone on the wikitech mailing list happen to know whether it
>>> would be possible for some of the larger wikipedia database downloads
>>> (which are, say, 16GB or so in size) to be split into parts so that
>>> they can be downloaded.  For whatever reason, whenever I have
>>> attempted to download the ~14GB files (say, from
>>> http://static.wikipedia.org/downloads/2008-06/en/ ), I have found that
>>> only 2GB (presumably, the first 2GB) of what I have sought to download
>>> has actually been downloaded.  Is there anyway around this?  Could
>>> anyone possibly suggest what possible reasons there might be for this
>>> difficulty in downloading the material?
>>
>>
>> Downloading to a filesystem that only does maximum 2GB files?
>>
>
> Also, several http clients don't like files over 2GB - this is because the 
> large
> number of bytes in the Length field causes an integer overflow (2GB is the 31
> bit limit). wget likes to die with a segmentation fault on those. I found that
> curl works.
>
> But of course, the file system also has to support very large files, as 
> Gerard said.
>
> Finally: yes, it would be nive to have such dumps available in pieces of 
> perhaps
> 1GB in size.
>
> -- daniel
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>




-- 
"Maybe you knew early on that your track went from point A to B, but
unlike you I wasn't given a map at birth!" Alyssa, "Chasing Amy"

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to