[Bug 28706] Allow post-parsed documents to be available offline and saved into IndexedDB and made queryable

bugzilla-daemon Sat, 09 Mar 2013 17:29:29 -0800

https://bugzilla.wikimedia.org/show_bug.cgi?id=28706


--- Comment #5 from Brett Zamir <[email protected]> ---
Sorry, I had not seen notification of the comment from Andre. No, this need is
not met by those projects.

And thank you for your engagement on this!

My proposal here is to utilize the emerging web standard of offline
applications to allow users of Mediawiki to use as many features as possible
offline including viewing previously saved content offline.

There are two main differences from the projects you mentioned:

1) The user would not need to download ALL content of Wikipedia for offline
browsing. They could download for offline use just those pages (or perhaps
categories) of interest to them, or perhaps allow configuration to store all
pages once visited, as well as the option to download the whole site at once
(keeping things in sync would be very nice, but no doubt impractical for a site
with as many changes as Wikipedia, unless perhaps, it were to be confined to
updating the page histories of interest instead of content).

2) There would not be any need for additional software beyond a (modern)
browser. While I understand it is a goal of Wikipedia to support all browsers
having any significant user base, I believe that taking advantage of this
emerging standard before it is implemented in all browsers is truly compelling
enough to let users with supporting browsers take advantage of this capability
today, while simply allowing non-supporting browsers to continue to use
Wikipedia in the same manner as before (i.e., without offline capability). If
implemented in this standard manner, as other browsers are upgraded, users of
these browsers will also be able to benefit from these features.

The specific web technologies required include:

1) Caching manifests which would allow Wikimedia's servers to send the HTML,
CSS, JavaScript, and image files behind the outward-facing core of Wikipedia to
the user's browser so that the files can continue to work offline when reading
the pages stored in step #2.

2) IndexedDB database offline database storage. This would store the files
chosen by the user (e.g., by their clicking links like "download this category
for offline use" or "download this page for offline use"). Users could ideally
also optionally allow entire page histories to be downloaded in addition to the
most current copy.

Since a complete implementation would be a vast undertaking, I would propose a
phased approach such as the following:

1) Implement caching manifests, so that supporting browsers could permanently
cache the HTML/CSS/JavaScript/image files used by Wikpedia, with the benefit of
providing faster performance for users and lighter demand on the
server--browsers would never need to ask Wikipedia for new copies of these
permanently stored files after the first time they obtained the copies, except
for the need to ping Wikimedia servers (when online) to check whether there
were any updates to those files that needed to be auto-downloaded. Browsers
currently do cache Wikipedia files in a similar manner, but they don't reserve
permanent space for these files, so the browser will periodically need to
re-download these files and thus slow down their visits and demand on the
server.

2) Implement IndexedDB storage for viewing of content pages specified by the
user (explicitly chosen individual pages or whenever-visited pages, entire
category of pages, the entire site, etc.). This might perhaps start with the
downloading only of the most current page, but moving to the option to download
the entire page history for offline viewing.

3) Implement offline search in a manner similar to the current online search
capabilities (but with a view toward supporting more sophisticated searches
such as described in step #5 below).

4) Move Mediawiki away from PHP/MySQL to a standard Server-Side JavaScript
solution to allow for code sharing between the server and client since the more
features implemented in JavaScript (in the right way) would mean an easier task
of supporting the same capabilities offline. This would also enhance
performance for users who had cached the content.

For example, currently if one wishes to compare two revisions and be shown the
"diffs", one has to request this of the server, with network connections being
the largest bottleneck of performance (especially where internet connectivity
is poor). If implemented in JavaScript, this functionality could be run
offline.

Likewise, it would be possible even to save up user edits so that when they
were online again, the system could ask them whether they wished to submit
their edits back to the server.

Of course, the longer the user had been offline, and the higher the traffic of
the Mediawiki site they were using (e.g., Wikipedia), the more likely they
would be to run into conflicts (pointing perhaps to the desirability of a
better file merge capability).

One other benefit to this language shift might be to stimulate Mediawiki
developers toward providing more user-friendly (Ajax-based) designs which did
not always force the user to have to wait for an entire page refresh.

5) Facilitate a "distributed wiki" or decentralized wiki model. The
improvements of #4 could be progressively enhanced to support users making
"forks" (i.e., their own independent version) of the content, so they could
store and view their own independent version of pages of interest to them
(e.g., by adding their own notes to wiki content), and perhaps even submit
their own modified version to a server of their choice if they wished to
publish their fork. While the benefits of Wikipedia often involve the community
working together, if the technology facilitated such a distributed model, there
would also be room for easily sharing Wikipedia-based content (including small
portions) amongst a community of expertise. This would not only be of use with
Wikipedia content, but for the Mediawiki software in general.

6) With a move toward WYSIWYG editing and client-side wiki language processing
facilitated by #4, the XHTML output could also be shared with users, and thus
allowing it to be predictably queryable to power users. The likes of a jQuery
plugin could be used to allow such power users the ability to request to find
content within a particular category, merge it with content from another page
or category, and then display the output. This capability would not need to be
enabled on the server, as such arbitrary queries could be demanding on server
performance, but it would not be an obstacle for users running the queries on
their own machines.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 28706] Allow post-parsed documents to be available offline and saved into IndexedDB and made queryable

Reply via email to