Hi Gabriel, Thanks for the tip. I'll pass this one along also.
Cheers, André André Costa | GLAM-utvecklare, Wikimedia Sverige | [email protected] | +46 (0)733-964574 Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se On 3 November 2015 at 01:11, Gabriel Wicke <[email protected]> wrote: > André, another option to anchor annotations (which is also linked from the > task Subbu mentioned) is hypothesis' approximate match algorithm: > https://github.com/hypothesis/dom-anchor-text-quote > > This approach uses xpaths of a selection where available (which would > profit from stable element ids), but falls back to approximate phrase > matching with some context. They use this to annotate random web pages and > PDFs: https://hypothes.is/ > > Gabriel > > On Mon, Nov 2, 2015 at 5:21 AM, André Costa <[email protected]> > wrote: > >> Hi Subbu, >> >> Many thanks for your answer. It confirmed some of my thoughts on how this >> might be done. >> >> I'll take this back to our team and get back if I have any updates. >> >> Cheers, >> André >> >> André Costa | GLAM-tekniker, Wikimedia Sverige | [email protected] >> | +46 (0)733-964574 >> >> Stöd fri kunskap, bli medlem i Wikimedia Sverige. >> Läs mer på blimedlem.wikimedia.se >> >> On 28 October 2015 at 19:17, Subramanya Sastry <[email protected]> >> wrote: >> >>> >>> I think you are looking for a solution that can attach metadata to >>> specific places in the DOM -- there have been other contexts where this has >>> come up as well. So, I think we need a generic solution to do this. >>> >>> That said, Parsoid assigns ids to individual elements in the DOM, and >>> so, an easy way to do this would be to store this data keyed on element ids >>> and then looked up this metadata separately. >>> >>> As for stability, we right now don't guarantee it, but this has come up >>> previously ( https://phabricator.wikimedia.org/T116350 ) and we haven't >>> tackled it because there hasn't been a compelling use case that would >>> benefit immediately from it, and we cannot reliably guarantee that the ids >>> will continue to be stable across a series of wikitext edits. >>> >>> But, on a edit-to-edit basis, Parsoid already does dom-diffs and >>> identifies only the edited portions of the DOM (and this is used internally >>> to support no-dirty-diff serialization of edited HTML to wikitext). >>> However, this functionality is not exposed currently outside of internal >>> Parsoid use. >>> >>> This doesn't answer your questions directly, but hope this is atleast in >>> the direction of what you are looking for. >>> >>> Subbu. >>> >>> >>> On 10/28/2015 06:31 AM, André Costa wrote: >>> >>> I have some general Parsoid questions I hoped someone here might help me >>> with. >>> >>> The background is that we are doing some preliminary work looking at how >>> Text-to-Speech might work on Wikipedia (there will be some info online in >>> the coming weeks). >>> >>> One detail of this is that you might occasionally have to highlight >>> specific words/sentences that are dealt with differently (e.g. World War >>> III -> World War 3). It is still unclear how frequent such things would be >>> but if they are very frequent then there would likely be push-back from the >>> community if this is stored in the normal wikitext. >>> >>> In this case we would have to store the markup outside of the wikitext >>> and any viewing/editing of it would have to happen in some user enabled >>> extension of the normal environment. >>> >>> And here we come to the question. >>> 1. If we would have to store this markup outside of the wikitext could >>> this be done by storing the individual parsoid-data-units? >>> 2. Would it be possible to add these units to the existing parsoid-data >>> (which gets loaded from the wikitext) when loading a page? >>> 3. Would it be possible to detect which of these units would be affected >>> by edits to the wikipage? >>> >>> This is still in the early stages so mainly we are looking at what >>> possibilities exist should we need them. Using Parsoid data was something >>> we thought of as a light-weight solution to having to store a synced copy >>> of the wikitext+additional markup. >>> >>> Cheers, >>> André >>> André Costa | GLAM-tekniker, Wikimedia Sverige | >>> <[email protected]>[email protected] | +46 (0)733-964574 >>> >>> Stöd fri kunskap, bli medlem i Wikimedia Sverige. >>> Läs mer på blimedlem.wikimedia.se >>> >>> >>> >>> _______________________________________________ >>> Wikitext-l mailing >>> [email protected]https://lists.wikimedia.org/mailman/listinfo/wikitext-l >>> >>> >>> >>> _______________________________________________ >>> Wikitext-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikitext-l >>> >>> >> >> _______________________________________________ >> Wikitext-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikitext-l >> >> > > > -- > Gabriel Wicke > Principal Engineer, Wikimedia Foundation > > _______________________________________________ > Wikitext-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitext-l > >
_______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
