Re: [Wikitext-l] Storing stand-alone Parsoid data

André Costa Fri, 06 Nov 2015 07:20:15 -0800

Hi Gabriel,

Thanks for the tip. I'll pass this one along also.


Cheers,
André

André Costa | GLAM-utvecklare, Wikimedia Sverige | [email protected]
| +46 (0)733-964574

Stöd fri kunskap, bli medlem i Wikimedia Sverige.
Läs mer på blimedlem.wikimedia.se

On 3 November 2015 at 01:11, Gabriel Wicke <[email protected]> wrote:

> André, another option to anchor annotations (which is also linked from the
> task Subbu mentioned) is hypothesis' approximate match algorithm:
> https://github.com/hypothesis/dom-anchor-text-quote
>
> This approach uses xpaths of a selection where available (which would
> profit from stable element ids), but falls back to approximate phrase
> matching with some context. They use this to annotate random web pages and
> PDFs: https://hypothes.is/
>
> Gabriel
>
> On Mon, Nov 2, 2015 at 5:21 AM, André Costa <[email protected]>
> wrote:
>
>> Hi Subbu,
>>
>> Many thanks for your answer. It confirmed some of my thoughts on how this
>> might be done.
>>
>> I'll take this back to our team and get back if I have any updates.
>>
>> Cheers,
>> André
>>
>> André Costa | GLAM-tekniker, Wikimedia Sverige | [email protected]
>> | +46 (0)733-964574
>>
>> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
>> Läs mer på blimedlem.wikimedia.se
>>
>> On 28 October 2015 at 19:17, Subramanya Sastry <[email protected]>
>> wrote:
>>
>>>
>>> I think you are looking for a solution that can attach metadata to
>>> specific places in the DOM -- there have been other contexts where this has
>>> come up as well. So, I think we need a generic solution to do this.
>>>
>>> That said, Parsoid assigns ids to individual elements in the DOM, and
>>> so, an easy way to do this would be to store this data keyed on element ids
>>> and then looked up this metadata separately.
>>>
>>> As for stability, we right now don't guarantee it, but this has come up
>>> previously ( https://phabricator.wikimedia.org/T116350 ) and we haven't
>>> tackled it because there hasn't been a compelling use case that would
>>> benefit immediately from it, and we cannot reliably guarantee that the ids
>>> will continue to be stable across a series of wikitext edits.
>>>
>>> But, on a edit-to-edit basis, Parsoid already does dom-diffs and
>>> identifies only the edited portions of the DOM (and this is used internally
>>> to support no-dirty-diff serialization of edited HTML to wikitext).
>>> However, this functionality is not exposed currently outside of internal
>>> Parsoid use.
>>>
>>> This doesn't answer your questions directly, but hope this is atleast in
>>> the direction of what you are looking for.
>>>
>>> Subbu.
>>>
>>>
>>> On 10/28/2015 06:31 AM, André Costa wrote:
>>>
>>> I have some general Parsoid questions I hoped someone here might help me
>>> with.
>>>
>>> The background is that we are doing some preliminary work looking at how
>>> Text-to-Speech might work on Wikipedia (there will be some info online in
>>> the coming weeks).
>>>
>>> One detail of this is that you might occasionally have to highlight
>>> specific words/sentences that are dealt with differently (e.g. World War
>>> III -> World War 3). It is still unclear how frequent such things would be
>>> but if they are very frequent then there would likely be push-back from the
>>> community if this is stored in the normal wikitext.
>>>
>>> In this case we would have to store the markup outside of the wikitext
>>> and any viewing/editing of it would have to happen in some user enabled
>>> extension of the normal environment.
>>>
>>> And here we come to the question.
>>> 1. If we would have to store this markup outside of the wikitext could
>>> this be done by storing the individual parsoid-data-units?
>>> 2. Would it be possible to add these units to the existing parsoid-data
>>> (which gets loaded from the wikitext) when loading a page?
>>> 3. Would it be possible to detect which of these units would be affected
>>> by edits to the wikipage?
>>>
>>> This is still in the early stages so mainly we are looking at what
>>> possibilities exist should we need them. Using Parsoid data was something
>>> we thought of as a light-weight solution to having to store a synced copy
>>> of the wikitext+additional markup.
>>>
>>> Cheers,
>>> André
>>> André Costa | GLAM-tekniker, Wikimedia Sverige |
>>> <[email protected]>[email protected] | +46 (0)733-964574
>>>
>>> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
>>> Läs mer på blimedlem.wikimedia.se
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikitext-l mailing 
>>> [email protected]https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikitext-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>>>
>>>
>>
>> _______________________________________________
>> Wikitext-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>>
>>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation
>
> _______________________________________________
> Wikitext-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>
>

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Re: [Wikitext-l] Storing stand-alone Parsoid data

Reply via email to