On Tue, Mar 12, 2013 at 11:27 AM, Jason Skomorowski <[email protected]> wrote: > Thanks for the substantial contribution. Better tools to share Wikipedia > have the potential to help many of the billions of people without reliable > access to the Internet have at least this one repository of knowledge at > their disposal. Important work this. > On 13-01-28 09:30 PM, gnosygnu wrote: >> >> [*snip*] >> >> XOWA also has the ability to work with the full tarball dumps (hence, >> dispensing with an always online connection). The tarball dumps are >> quite big though (English Wikipedia is 2.2 TB), so I don't know how >> many people would have the patience to download the entire set. >> >> Basically I wanted an offline reader that would also show images. The >> on-demand download allows users to download images for articles they >> are interested in. If they want all the images offline, then they have >> the option of downloading the tarball dumps. I'm still looking at an >> intermediate option between the two. > > Is there an option to use a path on the filesystem rather than a tarball? > This would be a pretty huge feature for two reasons: > * in order to sync only new files from > http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/ one needs to have > the images extracted. Extracting multiple terabytes and recreating a tarball > requires a lot of extra time and disk space > > * filesystem paths can be symlinked so that we can split this (very large) > collection across drives >
Sorry, I should have been more specific with my description. XOWA works off the files/directories from the extracted tarballs, not the tarball. For example, you can extract "enwiki-20121201-remote-media-1.tar" to "/home/". It will generate files like "/home/wikipedia/commons/7/70/A.png". Note that the file paths in the tarball are very similar to those on the WMF server: in this case, "http://upload.wikimedia.org/wikipedia/commons/7/70/A.png". XOWA can then be redirected to use the local filesystem so that if a page with [[File:A.png|thumb]] is opened, it will create the thumb from there (instead of downloading it from upload.wikimedia.org). If you are doing further syncing, the new files can be placed in "/home/wikipedia/commons" root, and as long as they match WMF's style, XOWA will pick them up. This is still not an ideal solution as a full tarball set still needs to be downloaded at one point in time -- which, for English Wikipedia, is 2.2 TB. I am looking at generating a "thumbs-only" archive which will bring it down to about 100 GB. I'd still need a way to distribute it, but will probably try torrenting first. Let me know if this is enough info or if you were referring to something else. Thanks. _______________________________________________ Offline-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/offline-l
