https://bugzilla.wikimedia.org/show_bug.cgi?id=47406
Christian Puehringer <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #8 from Christian Puehringer <[email protected]> --- I agree with that it is a good idea to support both replacement of entire articles and diffs. This is because I believe that omission of diff may have a huge impact on file size: In particular articles which are updated infrequently, and there are a lot of them, often have changes which only affect a single line, such as spelling fixes or interwiki link updates. Thus not supporting diffs may lead to very large diff file. On the other hand, when diff-support is available, it still may make sense to store complete articles instead of diffs for some articles, for example to reduce processing effort in diff creation, but also merge. One other point which is worth considering is disk space usage during merge. In particular on mobile devices, when the planned download manager is implemented, the incremental update capability would be very useful. However, if the update process requires free space for the old, the diff, and the new file, it won't be useable in many cases because there is not enough freespace on mobile devices. Therefore something like in-place updating the old zim file, and merging while dowloading the diff would make sense. For sure this is not trivial, and hardly possible to be implemented in the GSoC, but it would make sense to keep this in mind when defining the diff and merge processes and the diff file format, so that this feature could be added in the future. For example, with such an approach it is important that the zim file stays consistend during the update, and later resuming is possible. Note: If actual in-place update is not feasible, the zim-split feature could be used, e.g. on download of full zim file is split in let's say 64 MB chunks. On update new chunk (size may change) is written completely before old is deleted. Thus during update only 64 MB additional storage is required, while without in-place update tens of GB could be necessary. And other interesting feature could be to support on-the-fly merge: When this feature is used the diff zim file is not merged during update. Instead it is just stored besides the old zim file, and the zimlib merges the articles from both (or more) files on access. Benefit is that the end user does not need to run the probably pretty long running merge task. In addition it is faster, because no recompression needs to be done. Furthermore, no additional storage during update is required. Drawback is that is uses more storage, thus it depends on the size of the diff-files, whether this approach is actually feasible. Anyway, I think its worth considering this feature, as it should not require much additional effort, in fact it could even be an intermediate step for the currently planned approach. For sure for this on-the-fly approach supporting multiple deltas would make sense, but this could be implemented later. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
