On Tue, Mar 12, 2013 at 11:27 AM, Jason Skomorowski
<[email protected]> wrote:
> Thanks for the substantial contribution. Better tools to share Wikipedia
> have the potential to help many of the billions of people without reliable
> access to the Internet have at least this one repository of knowledge at
> their disposal. Important work this.
> On 13-01-28 09:30 PM, gnosygnu wrote:
>>
>> [*snip*]
>>
>> XOWA also has the ability to work with the full tarball dumps (hence,
>> dispensing with an always online connection). The tarball dumps are
>> quite big though (English Wikipedia is 2.2 TB), so I don't know how
>> many people would have the patience to download the entire set.
>>
>> Basically I wanted an offline reader that would also show images. The
>> on-demand download allows users to download images for articles they
>> are interested in. If they want all the images offline, then they have
>> the option of downloading the tarball dumps. I'm still looking at an
>> intermediate option between the two.
>
> Is there an option to use a path on the filesystem rather than a tarball?
> This would be a pretty huge feature for two reasons:
> * in order to sync only new files from
> http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/ one needs to have
> the images extracted. Extracting multiple terabytes and recreating a tarball
> requires a lot of extra time and disk space
>
> * filesystem paths can be symlinked so that we can split this (very large)
> collection across drives
>

Sorry, I should have been more specific with my description. XOWA
works off the files/directories from the extracted tarballs, not the
tarball.

For example, you can extract "enwiki-20121201-remote-media-1.tar" to
"/home/". It will generate files like
"/home/wikipedia/commons/7/70/A.png". Note that the file paths in the
tarball are very similar to those on the WMF server: in this case,
"http://upload.wikimedia.org/wikipedia/commons/7/70/A.png";. XOWA can
then be redirected to use the local filesystem so that if a page with
[[File:A.png|thumb]] is opened, it will create the thumb from there
(instead of downloading it from upload.wikimedia.org). If you are
doing further syncing, the new files can be placed in
"/home/wikipedia/commons" root, and as long as they match WMF's style,
XOWA will pick them up.

This is still not an ideal solution as a full tarball set still needs
to be downloaded at one point in time -- which, for English Wikipedia,
is 2.2 TB. I am looking at generating a "thumbs-only" archive which
will bring it down to about 100 GB. I'd still need a way to distribute
it, but will probably try torrenting first.

Let me know if this is enough info or if you were referring to something else.

Thanks.

_______________________________________________
Offline-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/offline-l

Reply via email to