Re: [Offline-l] How important is ZIM support in Collections?

Bjoern Hassler Wed, 13 Nov 2013 03:43:53 -0800

Hi Erik, Frederico, Emmanuel, hi all,

I just wanted to give some input on this. We're are using mediawiki to
provide teacher education resources for teachers in sub-Saharan
Africa, see  http://www.oer4schools.org. (The big picture is the 2nd
MDG of achieving Universal Primary Education.) We chose mediawiki
because it's an open platform, that has a lot of momentum behind it,
and because we can produce pdf and offline versions, which is
essential for us. The new visual editor is also an excellent
development.

Here are some of the things that are important to us. (Seeing as this
is a longer email, I've put this onto my blog as well, see
http://bjohas.de/Blog, with a bit more formatting, and some additional
notes.)

= Issues with current PDF generation =

Seeing as this thread is about pdf, I'll start with some issues around the pdf:

* We are essentially writing educational materials, and would like a
way of putting text in boxes to flag the nature of that text (e.g. as
a transcript, background reading, a note meant for facilitators). We
have implemented this quite straight forwardly through div with a
border or different background colour. However, because the current
pdf rendering uses the wiki text (rather than html) all this
formatting is lost. See http://www.oer4schools.org for examples as to
how we use boxes.

* Numbered section headings. Sections in the pdf aren't numbered,
which isn't helpful (the magic word NUMBEREDHEADINGS is ignored). This
may not be a problem for wikipedia articles, but when writing
materials for teacher education where you just need to be able to
refer to the number of the section (e.g. during workshops).

* We also make extensive use of the semantic mediawiki extension, e.g.
to assign episodes to our videos. Again, this isn't implemented in the
current pdf rendering pipeline.

I am not fully up to speed with what the plans are, but if the
proposal is html->pdf rendering, rather than wiki text -> pdf
rendering, then the above issues would be solved anyway.

= Our use cases =

More widely: What are our use cases? Our OER4Schools resource is used
by teachers
in Zambia for professional development, with very limited
connectivity. The following scenarios are critical in this work (and
would be similarly critical for most teacher education scenarios in
sub-Saharan Africa):

'''Scenario 1: Pdf / print.''' We need to be able to print our whole
professional development programme (around 200 pages). At the moment,
we print each wiki page needed to pdf, and then collate them. It's not
a great process. We can't use the collection extension because of the
above issues.

'''Scenario 2: Use on local web server.''' We would like to be able to
produce a static stand-alone version of the wiki (in html) that can
run off a local web server. It would be good if links to any
non-static content pointed back at the live version (e.g. links to
other namespaces, such as 'Special', as well as 'edit'/history links).
Ideally, the same (or a similar) version could run off a memory stick
for use on netbooks. We have tinkered with some scripts, and there are
other scripts out there: We'd love some help in finding something
robust.

'''Scenario 3: Use on tablets / phones.''' We would love to have a
version for mobile phones and tablets. Tablets are overtaking netbooks
at the moment, and are starting to become available cheaply. This
comes in two versions:

* '''Offline access:''' We'd love to have some advice how we can
achieve this with ZIM. I guess one issue is that we would want to
update our resource, and it would be good if that didn't mean that the
whole resource needs to be downloaded again. The biggest items are
uploads (files, images, audio, video). I think it would be ok for the
wiki text to be re-downloaded, but it would not be feasible for us to
re-download uploads.

* '''Online access:''' We'd love some advice on how to adapt the
Wikipedia apps to work with our wiki, to give efficient access.

A little further off topic: We would also like to implement the
mediawiki mobile rendering (as m.orbit.educ.cam.ac.uk). If somebody
wanted to help us with this, we would really appreciate it.

I'd certainly be happy to engage in the discussion, and help / test
new ideas in our context!

All the best,
Bjoern

On 13 November 2013 09:59, Emmanuel Engelhart <[email protected]> wrote:
>
> Hi Erik
>
> This is great to see you speaking about this now.
>
> Le 13/11/2013 06:51, Erik Moeller a écrit :
> > how important is ZIM support in Collections (the "Create a book"
> > feature) on Wikimedia sites? We implemented this a while ago to
> > support offline efforts. Since collections are still typically very
> > much limited in size, it's not a very viable option for huge offline
> > exports, more for batches of articles on related topics. Do people
> > currently rely on this functionality for offline deployments?
>
> Kiwix, as a project, does not rely directly on the WM ZIM export, but
> many of our users do. Of course they suffer of the limitations of the
> current solution and frankly: most of them are not aware of this feature.
>
> So, this would be for us an impairment. But, I agree something should be
> done. IMO we should somehow try to get 3 important output formats:
> * PDF (adapted for really small collections)
> * EPUB (the most used free ebook format)
> * ZIM (for bigger collections)
>
> > We're re-implementing the rendering pipeline for Collections to ensure
> > long-term maintainability, and our default would be to eliminate
> > initially all formats except for PDF if we don't absolutely have to
> > support them. I'll see if we can get some metrics on current ZIM file
> > usage via the Collection extension, but it'd be nice to get
> > qualitative feedback as well.
> >
> > (More background at: https://www.mediawiki.org/wiki/PDF_rendering )
>
> I have also seen your email to wikiteck-l:
> http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073059.html
>
> I think the choice of Parsoid as a rendering backend is a really good
> one for PDF. I have always been advocating the HTML2PDF approach. I also
> think that Parsoid delivers the mandatory information to hack the HTML
> correctly and adapt it for offline usage.
>
> That's why I have been working since March on a solution called
> mwoffliner (also using nodejs, like Parsoid):
> https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/mwoffliner.js
>
> Mwoffliner:
> 1 - Download a selection of articles from the Parsoid API
> 2 - Rewrite the HTML code
> 3 - Write the ZIM file (not yet implemented, files are written on the
> filesystem)
>
> You can have a idea of the rendering with this whole WPRU collection
> (ZIM file served with kiwix-serve):
> http://library.kiwix.org/wikipedia_ru_all
>
> If I correctly understand, points 1&2 are similar to what you plan to do
> for the new PDF pipeline. So, this would be great to collaborate on this
> and maintain the ZIM output. How does it sounds?
>
> Emmanuel
> --
> Kiwix - Wikipedia Offline & more
> * Web: http://www.kiwix.org
> * Twitter: https://twitter.com/KiwixOffline
> * more: http://www.kiwix.org/wiki/Communication
>
> _______________________________________________
> Offline-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/offline-l

_______________________________________________
Offline-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/offline-l

Re: [Offline-l] How important is ZIM support in Collections?

Reply via email to