https://bugzilla.wikimedia.org/show_bug.cgi?id=49143

       Web browser: ---
            Bug ID: 49143
           Summary: Store HTML and page properties with multi-part content
                    handler
           Product: Parsoid
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

MediaWiki currently stores the entire page content as WikiText. In addition to
WikiText, we would like to store

* The fully expanded HTML DOM
* Page properties: categories, magic word flags (notoc etc), DISPLAYTITLE, bug
48812, etc
* Parsoid-internal information: Basically data-parsoid moved out of the main
page DOM

Eventually we'd also like to be able to drop WikiText storage without having to
rework the storage architecture.

In the current MediaWiki external storage and ContentHandler architecture this
can be achieved by adding a multi-part content type with a corresponding
ContentHandler. This could be a JSON object or some other serialization.

A possible downside of the compound document approach stems from the need to
update transclusion or image expansions for a given revision. With append-only
and immutable external storage this can be implemented by storing a new
compound document and then updating the revision to point to it. Without
garbage collection this will result in several copies of unmodified WikiText
and page properties in external storage. However, this issue should probably be
addressed in the storage layer.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to