André, another option to anchor annotations (which is also linked from the task Subbu mentioned) is hypothesis' approximate match algorithm: https://github.com/hypothesis/dom-anchor-text-quote
This approach uses xpaths of a selection where available (which would profit from stable element ids), but falls back to approximate phrase matching with some context. They use this to annotate random web pages and PDFs: https://hypothes.is/ Gabriel On Mon, Nov 2, 2015 at 5:21 AM, André Costa <[email protected]> wrote: > Hi Subbu, > > Many thanks for your answer. It confirmed some of my thoughts on how this > might be done. > > I'll take this back to our team and get back if I have any updates. > > Cheers, > André > > André Costa | GLAM-tekniker, Wikimedia Sverige | [email protected] > | +46 (0)733-964574 > > Stöd fri kunskap, bli medlem i Wikimedia Sverige. > Läs mer på blimedlem.wikimedia.se > > On 28 October 2015 at 19:17, Subramanya Sastry <[email protected]> > wrote: > >> >> I think you are looking for a solution that can attach metadata to >> specific places in the DOM -- there have been other contexts where this has >> come up as well. So, I think we need a generic solution to do this. >> >> That said, Parsoid assigns ids to individual elements in the DOM, and so, >> an easy way to do this would be to store this data keyed on element ids and >> then looked up this metadata separately. >> >> As for stability, we right now don't guarantee it, but this has come up >> previously ( https://phabricator.wikimedia.org/T116350 ) and we haven't >> tackled it because there hasn't been a compelling use case that would >> benefit immediately from it, and we cannot reliably guarantee that the ids >> will continue to be stable across a series of wikitext edits. >> >> But, on a edit-to-edit basis, Parsoid already does dom-diffs and >> identifies only the edited portions of the DOM (and this is used internally >> to support no-dirty-diff serialization of edited HTML to wikitext). >> However, this functionality is not exposed currently outside of internal >> Parsoid use. >> >> This doesn't answer your questions directly, but hope this is atleast in >> the direction of what you are looking for. >> >> Subbu. >> >> >> On 10/28/2015 06:31 AM, André Costa wrote: >> >> I have some general Parsoid questions I hoped someone here might help me >> with. >> >> The background is that we are doing some preliminary work looking at how >> Text-to-Speech might work on Wikipedia (there will be some info online in >> the coming weeks). >> >> One detail of this is that you might occasionally have to highlight >> specific words/sentences that are dealt with differently (e.g. World War >> III -> World War 3). It is still unclear how frequent such things would be >> but if they are very frequent then there would likely be push-back from the >> community if this is stored in the normal wikitext. >> >> In this case we would have to store the markup outside of the wikitext >> and any viewing/editing of it would have to happen in some user enabled >> extension of the normal environment. >> >> And here we come to the question. >> 1. If we would have to store this markup outside of the wikitext could >> this be done by storing the individual parsoid-data-units? >> 2. Would it be possible to add these units to the existing parsoid-data >> (which gets loaded from the wikitext) when loading a page? >> 3. Would it be possible to detect which of these units would be affected >> by edits to the wikipage? >> >> This is still in the early stages so mainly we are looking at what >> possibilities exist should we need them. Using Parsoid data was something >> we thought of as a light-weight solution to having to store a synced copy >> of the wikitext+additional markup. >> >> Cheers, >> André >> André Costa | GLAM-tekniker, Wikimedia Sverige | >> <[email protected]>[email protected] | +46 (0)733-964574 >> >> Stöd fri kunskap, bli medlem i Wikimedia Sverige. >> Läs mer på blimedlem.wikimedia.se >> >> >> >> _______________________________________________ >> Wikitext-l mailing >> [email protected]https://lists.wikimedia.org/mailman/listinfo/wikitext-l >> >> >> >> _______________________________________________ >> Wikitext-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikitext-l >> >> > > _______________________________________________ > Wikitext-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitext-l > > -- Gabriel Wicke Principal Engineer, Wikimedia Foundation
_______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
