André, another option to anchor annotations (which is also linked from the
task Subbu mentioned) is hypothesis' approximate match algorithm:
https://github.com/hypothesis/dom-anchor-text-quote

This approach uses xpaths of a selection where available (which would
profit from stable element ids), but falls back to approximate phrase
matching with some context. They use this to annotate random web pages and
PDFs: https://hypothes.is/

Gabriel

On Mon, Nov 2, 2015 at 5:21 AM, André Costa <[email protected]>
wrote:

> Hi Subbu,
>
> Many thanks for your answer. It confirmed some of my thoughts on how this
> might be done.
>
> I'll take this back to our team and get back if I have any updates.
>
> Cheers,
> André
>
> André Costa | GLAM-tekniker, Wikimedia Sverige | [email protected]
> | +46 (0)733-964574
>
> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
> Läs mer på blimedlem.wikimedia.se
>
> On 28 October 2015 at 19:17, Subramanya Sastry <[email protected]>
> wrote:
>
>>
>> I think you are looking for a solution that can attach metadata to
>> specific places in the DOM -- there have been other contexts where this has
>> come up as well. So, I think we need a generic solution to do this.
>>
>> That said, Parsoid assigns ids to individual elements in the DOM, and so,
>> an easy way to do this would be to store this data keyed on element ids and
>> then looked up this metadata separately.
>>
>> As for stability, we right now don't guarantee it, but this has come up
>> previously ( https://phabricator.wikimedia.org/T116350 ) and we haven't
>> tackled it because there hasn't been a compelling use case that would
>> benefit immediately from it, and we cannot reliably guarantee that the ids
>> will continue to be stable across a series of wikitext edits.
>>
>> But, on a edit-to-edit basis, Parsoid already does dom-diffs and
>> identifies only the edited portions of the DOM (and this is used internally
>> to support no-dirty-diff serialization of edited HTML to wikitext).
>> However, this functionality is not exposed currently outside of internal
>> Parsoid use.
>>
>> This doesn't answer your questions directly, but hope this is atleast in
>> the direction of what you are looking for.
>>
>> Subbu.
>>
>>
>> On 10/28/2015 06:31 AM, André Costa wrote:
>>
>> I have some general Parsoid questions I hoped someone here might help me
>> with.
>>
>> The background is that we are doing some preliminary work looking at how
>> Text-to-Speech might work on Wikipedia (there will be some info online in
>> the coming weeks).
>>
>> One detail of this is that you might occasionally have to highlight
>> specific words/sentences that are dealt with differently (e.g. World War
>> III -> World War 3). It is still unclear how frequent such things would be
>> but if they are very frequent then there would likely be push-back from the
>> community if this is stored in the normal wikitext.
>>
>> In this case we would have to store the markup outside of the wikitext
>> and any viewing/editing of it would have to happen in some user enabled
>> extension of the normal environment.
>>
>> And here we come to the question.
>> 1. If we would have to store this markup outside of the wikitext could
>> this be done by storing the individual parsoid-data-units?
>> 2. Would it be possible to add these units to the existing parsoid-data
>> (which gets loaded from the wikitext) when loading a page?
>> 3. Would it be possible to detect which of these units would be affected
>> by edits to the wikipage?
>>
>> This is still in the early stages so mainly we are looking at what
>> possibilities exist should we need them. Using Parsoid data was something
>> we thought of as a light-weight solution to having to store a synced copy
>> of the wikitext+additional markup.
>>
>> Cheers,
>> André
>> André Costa | GLAM-tekniker, Wikimedia Sverige |
>> <[email protected]>[email protected] | +46 (0)733-964574
>>
>> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
>> Läs mer på blimedlem.wikimedia.se
>>
>>
>>
>> _______________________________________________
>> Wikitext-l mailing 
>> [email protected]https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>>
>>
>>
>> _______________________________________________
>> Wikitext-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>>
>>
>
> _______________________________________________
> Wikitext-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>
>


-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to