Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-20 Thread Gabriel Wicke
On 05/20/2014 02:46 AM, Daniel Kinzler wrote:
> My main reason for recycling the  tag was to not introduce a new tag
> extension.  may occur verbatim in existing wikitext, and would break
> when the tag is introduces.

The only existing mentions of this are probably us discussing it ;) In any
case, it's easy to grep for it & nowikify existing uses.

> Other than that, I'm find with outputting whatever tag you like for the
> transclusion. 

Great!

> Implementing the tag is something else, though - I could implement
> it so it will work for HTML transclusion, but I'm not sure I understand the
> original domparse stuff well enough to get that right. Would domparse be in
> core, btw?

Yes, it should be in core. I believe that a very simple implementation
(without actual DOM balancing, using Parser::recursiveTagParse()) would not
be too hard. The guts of it are described in [1]. The limitations of
recursiveTagParse should not matter much for this use case.

>> Now back to the syntax. Encoding complex transclusions in a HTML parameter
>> would be rather cumbersome, and would entail a lot of attribute-specific
>> escaping.
> 
> Why would it involve any escaping? It should be handled as a tag extension, 
> like
> any other.

Transclusions can contain quotes, which need to be escaped in attribute
values to make sure that the attribute is in fact an attribute. Since quotes
tend to be more common than  tags this means that there's going to
be more escaping. I also find it harder to scan for quotes ending a long
attribute value. Tags are easier to spot.

>> $wgRawHtml is disabled in all wikis we are currently interested in.
>> MediaWiki does properly report the  extension tag from siteinfo when
>> $wgRawHtml is enabled, so it ought to work with Parsoid for private wikis.
>> It will be harder to support the > transclusion=""> exception.
> 
> I should try what expandtemplates does with  with $wgRawHtml enabled.
> Nothing, probably. It will just come back containing raw HTML. Which would be
> fine, I think.

Yes, that case will work. But $wgRawHtml enabled is the exception, and not
something I'd like to encourage.

> By the way: once we agree on a mechanism, it would be trivial to use the same
> mechanism for special page transclusion. My patch actually already covers 
> that.
> Do you agree that this is the Right Thing? It's just transclusion of HTML
> content, after all.

Yes, that sounds good to me.

Gabriel

[1]:
https://www.mediawiki.org/wiki/Manual:Tag_extensions#How_do_I_render_wikitext_in_my_extension.3F

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-20 Thread Daniel Kinzler
Am 19.05.2014 23:05, schrieb Gabriel Wicke:
> I think we have agreement that some kind of tag is still needed. The main
> point still under discussion is on which tag to use, and how to implement
> this tag in the parser.

Indeed.

> Originally,  was conceived to be used in actual page content to
> wrap wikitext that is supposed to be parsed to a balanced DOM *as a unit*
> rather than transclusion by transclusion. Once unbalanced compound
> transclusion content is wrapped in  tags (manually or via bots
> using Parsoid info), we can start to enforce nesting of all other
> transclusions by default. This will make editing safer and more accurate,
> and improve performance by letting us reuse expansions and avoid
> re-rendering the entire page during refreshLinks. See
> https://bugzilla.wikimedia.org/show_bug.cgi?id=55524 for more background.


Ah, I though you just pulled that out of your hat :)

My main reason for recycling the  tag was to not introduce a new tag
extension.  may occur verbatim in existing wikitext, and would break
when the tag is introduces.

Other than that, I'm find with outputting whatever tag you like for the
transclusion. Implementing the tag is something else, though - I could implement
it so it will work for HTML transclusion, but I'm not sure I understand the
original domparse stuff well enough to get that right. Would domparse be in
core, btw?


> Now back to the syntax. Encoding complex transclusions in a HTML parameter
> would be rather cumbersome, and would entail a lot of attribute-specific
> escaping.

Why would it involve any escaping? It should be handled as a tag extension, like
any other.

> $wgRawHtml is disabled in all wikis we are currently interested in.
> MediaWiki does properly report the  extension tag from siteinfo when
> $wgRawHtml is enabled, so it ought to work with Parsoid for private wikis.
> It will be harder to support the  transclusion=""> exception.

I should try what expandtemplates does with  with $wgRawHtml enabled.
Nothing, probably. It will just come back containing raw HTML. Which would be
fine, I think.

By the way: once we agree on a mechanism, it would be trivial to use the same
mechanism for special page transclusion. My patch actually already covers that.
Do you agree that this is the Right Thing? It's just transclusion of HTML
content, after all.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 12:46 PM, Daniel Kinzler wrote:
> Am 19.05.2014 20:01, schrieb Gabriel Wicke:
>> On 05/19/2014 10:55 AM, Bartosz Dziewoński wrote:
>>> I am kind of lost in this discussion, but let me just ask one question.
>>>
>>> Won't all of the proposed solutions, other than the one of just not
>>> expanding transclusions that can't be expanded to wikitext, break the
>>> original and primary purpose of ExpandTemplates: providing valid parsable
>>> wikitext, for understanding by humans and for pasting back into articles in
>>> order to bypass transclusion limits?
>> 
>> Yup. But that's the case with , while it's not the case with
>>  unless $wgRawHtml is true (which is impossible for publicly-editable
>> wikis).
> 
>  would work transparently. It would contain HTML, 
> for
> direct use by the client, and could be passed back to the parser, which would
> ignore the HTML and execute the transclusion. It should be 100% compatible 
> with
> existing clients (unless the look for verbatim "" for some reason).

Currently  tags are escaped when $wgRawHtml is disabled. We could
change the implementation to stop doing so *iff* the transclusion parameter
is supplied, but IMO that would be fairly unexpected and inconsistent behavior.

>>> I feel that Parsoid should be using a separate API for whatever it's doing
>>> with the wikitext. I'm sure that would give you more flexibility with
>>> internal design as well.
>> 
>> We are moving towards that, but will still need to support unbalanced
>> transclusions for a while.
> 
> But for HTML based transclusions you could ignore that - you could already
> resolve these using a separate API call, if needed.

Yes, and they are going to be the common case once we have marked up the
exceptions with tags like . As you correctly pointed out, inline
tags are primarily needed for expandtemplates calls on compound content,
which we need to do as long as we support unbalanced templates. We can't
know a priori whether some transclusions in turn transclude special HTML
content.

I think we have agreement that some kind of tag is still needed. The main
point still under discussion is on which tag to use, and how to implement
this tag in the parser.

Originally,  was conceived to be used in actual page content to
wrap wikitext that is supposed to be parsed to a balanced DOM *as a unit*
rather than transclusion by transclusion. Once unbalanced compound
transclusion content is wrapped in  tags (manually or via bots
using Parsoid info), we can start to enforce nesting of all other
transclusions by default. This will make editing safer and more accurate,
and improve performance by letting us reuse expansions and avoid
re-rendering the entire page during refreshLinks. See
https://bugzilla.wikimedia.org/show_bug.cgi?id=55524 for more background.

The use of  to mark up special HTML transclusions in
expandtemplates output will be temporary (until HTML transclusions are the
default), but even if such output is pasted into the actual wikitext it
would be harmless, and would work as expected.

Now back to the syntax. Encoding complex transclusions in a HTML parameter
would be rather cumbersome, and would entail a lot of attribute-specific
escaping. Wrapping such transclusions in  tags on the other hand
normally does not entail any escaping, as only nested  tags are
problematic.

> Parsoid would keep working as before: it would treat  as a tag extension
> (it does that, right?)

$wgRawHtml is disabled in all wikis we are currently interested in.
MediaWiki does properly report the  extension tag from siteinfo when
$wgRawHtml is enabled, so it ought to work with Parsoid for private wikis.
It will be harder to support the  exception.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Daniel Kinzler
Am 19.05.2014 20:01, schrieb Gabriel Wicke:
> On 05/19/2014 10:55 AM, Bartosz Dziewoński wrote:
>> I am kind of lost in this discussion, but let me just ask one question.
>>
>> Won't all of the proposed solutions, other than the one of just not
>> expanding transclusions that can't be expanded to wikitext, break the
>> original and primary purpose of ExpandTemplates: providing valid parsable
>> wikitext, for understanding by humans and for pasting back into articles in
>> order to bypass transclusion limits?
> 
> Yup. But that's the case with , while it's not the case with
>  unless $wgRawHtml is true (which is impossible for publicly-editable
> wikis).

 would work transparently. It would contain HTML, for
direct use by the client, and could be passed back to the parser, which would
ignore the HTML and execute the transclusion. It should be 100% compatible with
existing clients (unless the look for verbatim "" for some reason).

I'll have to re-read Gabriel's  proposal tomorrow - right now, I don't
see why it would be necessary, or how it would improve the situation.

>> I feel that Parsoid should be using a separate API for whatever it's doing
>> with the wikitext. I'm sure that would give you more flexibility with
>> internal design as well.
> 
> We are moving towards that, but will still need to support unbalanced
> transclusions for a while.

But for HTML based transclusions you could ignore that - you could already
resolve these using a separate API call, if needed.

But still - I do not see why that would be necessary. If expandtemplates returns
, clients can pass that back to the parser safely, or
use the contained HTML directly, safely.

Parsoid would keep working as before: it would treat  as a tag extension
(it does that, right?) and pass it back to the parser (which would expand it
again, this time fully, if action=parse is used). If parsoid knows about the
special properties of , it could just use the contents verbatim - I see no
reason why that would be any more unsafe as any other HTML returned by the 
parser.

But perhaps I'm missing something obvious. I'll re-read the proposal tomorrow.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Bartosz Dziewoński

I am kind of lost in this discussion, but let me just ask one question.

Won't all of the proposed solutions, other than the one of just not expanding 
transclusions that can't be expanded to wikitext, break the original and 
primary purpose of ExpandTemplates: providing valid parsable wikitext, for 
understanding by humans and for pasting back into articles in order to bypass 
transclusion limits?

I feel that Parsoid should be using a separate API for whatever it's doing with 
the wikitext. I'm sure that would give you more flexibility with internal 
design as well.

--
Matma Rex

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 10:55 AM, Bartosz Dziewoński wrote:
> I am kind of lost in this discussion, but let me just ask one question.
> 
> Won't all of the proposed solutions, other than the one of just not
> expanding transclusions that can't be expanded to wikitext, break the
> original and primary purpose of ExpandTemplates: providing valid parsable
> wikitext, for understanding by humans and for pasting back into articles in
> order to bypass transclusion limits?

Yup. But that's the case with , while it's not the case with
 unless $wgRawHtml is true (which is impossible for publicly-editable
wikis).

> I feel that Parsoid should be using a separate API for whatever it's doing
> with the wikitext. I'm sure that would give you more flexibility with
> internal design as well.

We are moving towards that, but will still need to support unbalanced
transclusions for a while. Since special transclusions can be nested inside
of those we will need some form of inline support even if we expand most
transclusions all the way to DOM with a different end point. Also, as Daniel
pointed out, most other users are using action=expandtemplates for entire
pages and expect that to work as well.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 10:19 AM, Gabriel Wicke wrote:
> On 05/19/2014 04:54 PM, Gabriel Wicke wrote:
>> The move to HTML-based (self-contained) transclusions expansions will avoid
>> this issue completely. That's a few months out though. Maybe we can find a
>> stop-gap solution that moves in that direction, without introducing special
>> tags in expandtemplates that we'll have to support for a long time.
> 
> Here's a proposal:
> 
> * Introduce a  extension tag that causes its content to be parsed
> all the way to a self-contained DOM structure. Example:
> {{T}}
> 
> * Emit this tag for HTML page transclusions. Avoids the security issue as
> there's no way to inject verbatim HTML. Works with Parsoid out of the box.
> 
> * Use  to support parsing unbalanced templates by inserting it
> into wikitext:
> 
> {{table-start}}
> {{table-row}}
> {{table-end}}
> 
> 
> * Build a solid HTML-only expansion API end point, and start using that for
> all transclusions that are not wrapped in 
> 
> * Stop wrapping non-wikitext transclusions into  in
> action=expandtemplates once those can be directly expanded to a
> self-contained DOM.

Here's a possible division of labor:

You (Daniel) could start with the second step (emitting the tag). Since not
much escaping is needed (only nested  tags in the transclusion)
this should be fairly straightforward.

We could work on the extension implementation (first bullet point) together,
or tackle it completely on the Parsoid side. We planned to work on this in
any case as part of our longer-term migration to well-balanced HTML
transclusions.

The advantage of using  to support both unbalanced templates &
special transclusions is that we'll only have to implement this once, and
won't introduce another tag only to deprecate it fairly quickly. Phasing out
unbalanced templates will take longer, as we'll first have to come up with
alternative means to support the same use cases.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 04:54 PM, Gabriel Wicke wrote:
> The move to HTML-based (self-contained) transclusions expansions will avoid
> this issue completely. That's a few months out though. Maybe we can find a
> stop-gap solution that moves in that direction, without introducing special
> tags in expandtemplates that we'll have to support for a long time.

Here's a proposal:

* Introduce a  extension tag that causes its content to be parsed
all the way to a self-contained DOM structure. Example:
{{T}}

* Emit this tag for HTML page transclusions. Avoids the security issue as
there's no way to inject verbatim HTML. Works with Parsoid out of the box.

* Use  to support parsing unbalanced templates by inserting it
into wikitext:

{{table-start}}
{{table-row}}
{{table-end}}


* Build a solid HTML-only expansion API end point, and start using that for
all transclusions that are not wrapped in 

* Stop wrapping non-wikitext transclusions into  in
action=expandtemplates once those can be directly expanded to a
self-contained DOM.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 09:52 AM, Daniel Kinzler wrote:
> Am 18.05.2014 16:29, schrieb Gabriel Wicke:
>> The difference between wrapper and property is actually that using inline
>> wrappers in the returned wikitext would force us to escape similar wrappers
>> from normal template content to avoid opening a gaping XSS hole.
> 
> Please explain, I do not see the hole you mention.
> 
> If the input contained evil stuff, it would just get escaped by 
> the
> preprocessor (unless $wgRawHtml is enabled), as it is now:
> https://de.wikipedia.org/w/api.php?action=expandtemplates&text=%3Chtml%3E%3Cscript%3Ealert%28%27evil%27%29%3C/script%3E%3C/html%3E

What you see there is just unescaped HTML embedded in the XML result format.
It's clearer that there's in fact no escaping on the HTML when looking at
the JSON:

https://de.wikipedia.org/w/api.php?action=expandtemplates&text=%3Chtml%3E%3Cscript%3Ealert%28%27evil%27%29%3C/script%3E%3C/html%3E&format=json

Parsoid depends on there being no escaping for unknown tags (and known
extension tags) in the preprocessor.

So if you use tags, you'll have to add escaping for those.

The move to HTML-based (self-contained) transclusions expansions will avoid
this issue completely. That's a few months out though. Maybe we can find a
stop-gap solution that moves in that direction, without introducing special
tags in expandtemplates that we'll have to support for a long time.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Daniel Kinzler
Am 19.05.2014 14:21, schrieb Subramanya Sastry:
> On 05/19/2014 04:52 AM, Daniel Kinzler wrote:
>> I'm getting the impression there is a fundamental misunderstanding here.
> 
> You are correct. I completely misunderstood what you said in your last 
> response
> about expandtemplates. So, the rest of my response to your last email is
> irrelevant ... and let me reboot :-).

Glad we got that out of the way :)

> On 05/17/2014 06:14 PM, Daniel Kinzler wrote:
>> I think something like ...
>> would work best.
> 
> I see what you are getting at here. Parsoid can treat this like a regular
> tag-extension and send it back to the api=parse endpoint for processing.
> Except
> if you provided the full expansion as the content of the html-wrapper in which
> case the extra api call can be skipped. The extra api call is not really an
> issue for occasional uses, but on pages with a lot of non-wikitext 
> transclusion
> uses, this is an extra api call for each such use. I don't have a sense for 
> how
> common this would be, so maybe that is a premature worry.

I would probably go for always including the expanded HTML for now.

> That said, for other clients, this content would be deadweight (if they are
> going to discard it and go back to the api=parse endpoint anyway or worse send
> back the entire response to the parser that is going to just discard it after
> the network transfer).

Yes. There could be an option to omit it. That makes the implementation more
complex, but it's doable.

> So, looks like there are some conflicting perf. requirements for different
> clients wrt expandtemplates response here. In that context, at least from a
> solely parsoid-centric point of view, the new api endpoint 'expand=Foo|x|y|z'
> you proposed would work well as well.

That seems the cleanest solution for the parsoid use case - however, the
implementation is complicated by how parameter substitution works. For HTML
based transclusion, it doesn't work at all at the moment - we would need tighter
integration with the preprocessor for doing that.

Basically, there would be two cases: convert expand=Foo|x|y|z to {{Foo|x|y|z}}
internally an call Parser::preprocess on that, so parameter subsitution is done
correctly; or get the HTML from Foo, and discard the parameters. We would have
to somehow know in advance which mode to use, handle the appropriate case, and
then set the Content-Type header accordingly. Pretty messy...

I think  is the simplest and most robust solution for
now.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Subramanya Sastry

On 05/19/2014 04:52 AM, Daniel Kinzler wrote:

I'm getting the impression there is a fundamental misunderstanding here.


You are correct. I completely misunderstood what you said in your last 
response about expandtemplates. So, the rest of my response to your last 
email is irrelevant ... and let me reboot :-).



All that said, if you want to provide the wrapper with fully-expanded-HTML, we can handle that as well. We'll use the model
attribute of the wrapper, discard the wrapper and use the contents in our 
pipeline.

Why use the model attribute? Why would you care about the original model? All
you need to know is that you'll get HTML. Exposing the original model in this
context seems useless if not misleading.
Given that I misunderstood your larger observation about 
expandtemplates, this is not relevant now. But, I was basing this on 
your proposal from the previous email which I'll now go back to.


On 05/17/2014 06:14 PM, Daniel Kinzler wrote:
I think something like model="whatever">... would work best.


I see what you are getting at here. Parsoid can treat this like a 
regular tag-extension and send it back to the api=parse endpoint for 
processing. Except if you provided the full expansion as the content of 
the html-wrapper in which case the extra api call can be skipped. The 
extra api call is not really an issue for occasional uses, but on pages 
with a lot of non-wikitext transclusion uses, this is an extra api call 
for each such use. I don't have a sense for how common this would be, so 
maybe that is a premature worry.


That said, for other clients, this content would be deadweight (if they 
are going to discard it and go back to the api=parse endpoint anyway or 
worse send back the entire response to the parser that is going to just 
discard it after the network transfer).


So, looks like there are some conflicting perf. requirements for 
different clients wrt expandtemplates response here. In that context, at 
least from a solely parsoid-centric point of view, the new api endpoint 
'expand=Foo|x|y|z' you proposed would work well as well.


Subbu.


A separate property in the JSON/XML structure avoids the need for escaping
(and associated security risks if not done thoroughly), and should be
relatively straightforward to implement and consume.

As explained above, I do not see how this would work except for the very special
case of using expandtemplates to expand just a single template. This could be
solved by introducing a new, single template mode for expandtemplates, e.g.
using expand="Foo|x|y|z" instead of text="{{Foo|x|y|z}}".

Another way would be to use hints the structure returned by generatexml. There,
we have an opportunity to declare a content type for a *part* of the output (or
rather, input).


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Daniel Kinzler
I'm getting the impression there is a fundamental misunderstanding here.

Am 18.05.2014 04:28, schrieb Subramanya Sastry:
> So, consider this wikitext for page P.
> 
> == Foo ==
> {{wikitext-transclusion}}
>   *a1
>  ... 
>   *a2
> {{T}} (the html-content-model-transclusion)
>   *a3
> 
> Parsoid gets wikitext from the API for {{wikitext-transclusion}}, parses it 
> and
> injects the tokens into the P's content. Parsoid gets HTML from the API for
> ... and injects the HTML into the not-fully-processed wikitext 
> of P
> (by adding an appropriate token wrapper). So, if {{T}} returns HTML (i.e. the 
> MW
> API lets Parsoid know that it is HTML), Parsoid can inject the HTML into the
> not-fully-processed wikitext and ensure that the final output comes out right
> (in this case, the HTML from both the map extension and {{T}} would not get
> sanitized as it should be).
> 
> Does that help explain why we said we don't need the html wrapper?

No, it actually misses my point completely. My point is that this may work with
the way parsoid uses expandtemplates, but it does not work for expandtemplates
in general. Because expandtemplates takes full wikitext as input, and only
partially replaces it.

So, let me phrase it this way:

If expandtemplates is called with text=

   == Foo ==
   {{T}}

   [[Category:Bla]]

What should it return, and what content type should be declared in the http 
header?

Note that I'm not talking about how parsoid processes this text. That's not my
point - my point is that expandtemplates can be and is used on full wikitext. In
that context, the return type cannot be HTML.

> All that said, if you want to provide the wrapper with  >fully-expanded-HTML, we can handle that as well. We'll use the 
> model
> attribute of the wrapper, discard the wrapper and use the contents in our 
> pipeline.

Why use the model attribute? Why would you care about the original model? All
you need to know is that you'll get HTML. Exposing the original model in this
context seems useless if not misleading.  The difference between wrapper and property is actually that using inline
> wrappers in the returned wikitext would force us to escape similar wrappers
> from normal template content to avoid opening a gaping XSS hole.

Please explain, I do not see the hole you mention.

If the input contained evil stuff, it would just get escaped by the
preprocessor (unless $wgRawHtml is enabled), as it is now:
https://de.wikipedia.org/w/api.php?action=expandtemplates&text=%3Chtml%3E%3Cscript%3Ealert%28%27evil%27%29%3C/script%3E%3C/html%3E

If  was passed, the parser/preprocessor would treat it
like it would treat {{T}} - it would get trusted, backend generated HTML from
respective Content object.

I see no change, and no opportunity to inject anything. Am I missing something?

> A separate property in the JSON/XML structure avoids the need for escaping
> (and associated security risks if not done thoroughly), and should be
> relatively straightforward to implement and consume.

As explained above, I do not see how this would work except for the very special
case of using expandtemplates to expand just a single template. This could be
solved by introducing a new, single template mode for expandtemplates, e.g.
using expand="Foo|x|y|z" instead of text="{{Foo|x|y|z}}".

Another way would be to use hints the structure returned by generatexml. There,
we have an opportunity to declare a content type for a *part* of the output (or
rather, input).

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-18 Thread Gabriel Wicke
On 05/18/2014 02:28 AM, Subramanya Sastry wrote:
> However, in his previous message, Gabriel indicated that
> a property in the JSON/XML response structure might work better for
> multi-part responses.

The difference between wrapper and property is actually that using inline
wrappers in the returned wikitext would force us to escape similar wrappers
from normal template content to avoid opening a gaping XSS hole.

A separate property in the JSON/XML structure avoids the need for escaping
(and associated security risks if not done thoroughly), and should be
relatively straightforward to implement and consume.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry

On 05/17/2014 06:14 PM, Daniel Kinzler wrote:

Am 17.05.2014 17:57, schrieb Subramanya Sastry:

On 05/17/2014 10:51 AM, Subramanya Sastry wrote:

So, going back to your original implementation, here are at least 3 ways I see
this working:

2. action=expandtemplates returns a ... for the expansion of
{{T}}, but also provides an additional API response header that tells Parsoid
that T was a special content model page and that the raw HTML that it received
should not be sanitized.

Actually, the  wrapper is not even required here since the new API
response header (for example, X-Content-Model: HTML) is sufficient to know what
to do with the response body.

  But that would only work if {{T}} was the whole text that was being expanded 
(I
guess that's what you do with parsoid, right? Took me a minute to realize that).
expandtemplates operates on full wikitext. If the input is something like

   == Foo ==
   {{T}}

   [[Category:Bla}}

Then expanding {{T}} without a wrapper and pretending the result was HTML would
just be wrong.


Parsoid handles this correctly. We have mechanisms for injecting HTML as 
well as wikitext into the toplevel page. For example, tag extensions 
currently return fully expanded html (we use action=parse API endpoint) 
and we inject that HTML into the page. So, consider this wikitext for 
page P.


== Foo ==
{{wikitext-transclusion}}
  *a1
 ... 
  *a2
{{T}} (the html-content-model-transclusion)
  *a3

Parsoid gets wikitext from the API for {{wikitext-transclusion}}, parses 
it and injects the tokens into the P's content. Parsoid gets HTML from 
the API for ... and injects the HTML into the 
not-fully-processed wikitext of P (by adding an appropriate token 
wrapper). So, if {{T}} returns HTML (i.e. the MW API lets Parsoid know 
that it is HTML), Parsoid can inject the HTML into the 
not-fully-processed wikitext and ensure that the final output comes out 
right (in this case, the HTML from both the map extension and {{T}} 
would not get sanitized as it should be).


Does that help explain why we said we don't need the html wrapper?

All that said, if you want to provide the wrapper with model="whatever" >fully-expanded-HTML, we can handle that as 
well. We'll use the model attribute of the wrapper, discard the wrapper 
and use the contents in our pipeline.


So, model information either as an attribute on the wrapper, api 
response header, or a property in the JSON/XML response structure would 
all work for us. I don't have clarity on which of these three is the 
best mechanism for providing the template-page content-model information 
to clients .. so till such time I understand that better, I dont have an 
opinion about the specific mechanism. However, in his previous message, 
Gabriel indicated that a property in the JSON/XML response structure 
might work better for multi-part responses.


Subbu.


Regarding trusting the output: MediaWiki core trusts the generated HTML for
direct output. It's no different from the HTML generated by e.g. special pages
in that regard.

I think something like ...
would work best.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Daniel Kinzler
Am 17.05.2014 17:57, schrieb Subramanya Sastry:
> On 05/17/2014 10:51 AM, Subramanya Sastry wrote:
>> So, going back to your original implementation, here are at least 3 ways I 
>> see
>> this working:
>>
>> 2. action=expandtemplates returns a ... for the expansion of
>> {{T}}, but also provides an additional API response header that tells Parsoid
>> that T was a special content model page and that the raw HTML that it 
>> received
>> should not be sanitized.
> 
> Actually, the  wrapper is not even required here since the new 
> API
> response header (for example, X-Content-Model: HTML) is sufficient to know 
> what
> to do with the response body.

 But that would only work if {{T}} was the whole text that was being expanded (I
guess that's what you do with parsoid, right? Took me a minute to realize that).
expandtemplates operates on full wikitext. If the input is something like

  == Foo ==
  {{T}}

  [[Category:Bla}}

Then expanding {{T}} without a wrapper and pretending the result was HTML would
just be wrong.

Regarding trusting the output: MediaWiki core trusts the generated HTML for
direct output. It's no different from the HTML generated by e.g. special pages
in that regard.

I think something like ...
would work best.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Gabriel Wicke
On 05/17/2014 05:57 PM, Subramanya Sastry wrote:
> On 05/17/2014 10:51 AM, Subramanya Sastry wrote:
>> So, going back to your original implementation, here are at least 3 ways I
>> see this working:
>>
>> 2. action=expandtemplates returns a ... for the expansion of
>> {{T}}, but also provides an additional API response header that tells
>> Parsoid that T was a special content model page and that the raw HTML that
>> it received should not be sanitized.
> 
> Actually, the  wrapper is not even required here since the new
> API response header (for example, X-Content-Model: HTML) is sufficient to
> know what to do with the response body.

Indeed.

Also, instead of the header we can just set a property / attribute in the
JSON/XML response structure. This will also work for multi-part responses,
for example when calling action=expandtemplates on multiple titles.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry

On 05/17/2014 10:51 AM, Subramanya Sastry wrote:
So, going back to your original implementation, here are at least 3 
ways I see this working:


2. action=expandtemplates returns a ... for the expansion 
of {{T}}, but also provides an additional API response header that 
tells Parsoid that T was a special content model page and that the raw 
HTML that it received should not be sanitized.


Actually, the  wrapper is not even required here since the 
new API response header (for example, X-Content-Model: HTML) is 
sufficient to know what to do with the response body.


Subbu.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry
(Top posting to quickly summarize what I gathered from the discussion 
and what would be required for Parsoid to expand pages with these 
transclusions).


Parsoid currently relies on the mediawiki API to preprocess 
transclusions and return wikitext (uses action=expandtemplates for this) 
which it then parses using native Parsoid pipeline.  Parsoid processes 
extension tags via action=parse and weaves the result back into the 
top-level content of the page.


As per your original email, I am assuming the T is a page with a special 
content model that generates HTML and another page P has a transclusion 
{{T}}.


So, when Parsoid encounters {{T}}, it should be able to replace {{T}} 
with the HTML to generate the right parse output for P.


So, I am listing below 4 possible ways action=expandtemplates can 
process {{T}}


1. Your newest implementation (that just returns back {{T}}):

* If Parsoid gets back {{T}}, one of two things can happen:
--- Parsoid, as usual, tries to parse it as wikitext, and it gets stuck 
in an infinite loop (query MW api for expansion of {{T}}, get back 
{{T}}, parse it as {{T}}, query MW api for expansion of {{T}},  ). 
So, this will definitely not work.
--- Parsoid adds a special case check to see if the API sent back {{T}}, 
and in which case, requires a different API endpoint 
(action=expandtohtml maybe?) to send back the html expansion based on 
the assumption about output of expandtemplates. This would work and 
would require the new endpoint to be implemented, but feels hacky.


So, going back to your original implementation, here are at least 3 ways 
I see this working:


2. action=expandtemplates returns a ... for the expansion 
of {{T}}, but also provides an additional API response header that tells 
Parsoid that T was a special content model page and that the raw HTML 
that it received should not be sanitized.


3. action=expandtemplates returns ... for the expansion of 
{{T}} and no other indication about T being a special content model page 
or not. However, if Parsoid (and other clients) are to trust these html 
output always without sanitization, expandtemplates implementation 
should have a conditional sanitization of  tags encountered in 
wikitext to prevent XSS. As far as I understand, expandtemplates (on 
master, not your patch) does not do this tag sanitization. But, 
independent of that, what Parsoid and clients need is a guarantee that 
it is safe to blindly splice the contents of any ... it 
receives for any {{T}} no matter whether what content model T implements.


4. Parsoid first queries the MW-api to find out the content model of T 
for every transclusion {{T}} it encounters on the page P and based on 
the content-model info, knows how to process the output of 
action=expandtemplates.


Clearly 4. is expensive and 3. seems hacky, but if it can be made to 
work, we can work with that.


But, both Gabriel and I think that solution 2. is the cleanest solution 
for now that would work. The PHP parser (in your patch to handle {{T}}) 
already has information about the content model of T when it is 
expanding {{T}} and it seems simplest and cleanest to return this 
information back to clients in the non-default content content-model 
expansions. That gives clients like Parsoid the cleanest way of handling 
these.


If I am missing something or this is unclear, and this getting into too 
much back and forth on email and it is simpler to discuss this on IRC, I 
can hop onto any IRC channel on Monday or we can do this on 
#mediawiki-parsoid, and one of us could later summarize the discussion 
back onto this thread.


Thanks,
Subbu.


On 05/17/2014 02:54 AM, Daniel Kinzler wrote:

Am 16.05.2014 21:07, schrieb Gabriel Wicke:

On 05/15/2014 04:42 PM, Daniel Kinzler wrote:

The one thing that will not work on wikis with
$wgRawHtml disabled is parsing the output of expandtemplates.

Yes, which means that it won't work with Parsoid, Flow, VE and other users.

And it has been fixed now. In the latest version, expandtemplates will just
return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.


I do think that we can do better, and I pointed out possible ways to do so
in my earlier mail:


My preference
would be to let the consumer directly ask for pre-expanded wikitext *or*
HTML, without overloading action=expandtemplates. Even indicating the
content type explicitly in the API response (rather than inline with an HTML
tag) would be a better stop-gap as it would avoid some of the security and
compatibility issues described above.

I don't quite understand what you are asking for... action=parse returns HTML,
action=expandtemplates returns wikitext. The issue was with "mixed" output, that
is, representing the expandion of templates that generate HTML in wikitext. The
solution I'm going for no is to simply not expand them.

-- daniel





___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimed

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Daniel Kinzler
Am 16.05.2014 21:07, schrieb Gabriel Wicke:
> On 05/15/2014 04:42 PM, Daniel Kinzler wrote:
>> The one thing that will not work on wikis with
>> $wgRawHtml disabled is parsing the output of expandtemplates.
> 
> Yes, which means that it won't work with Parsoid, Flow, VE and other users.

And it has been fixed now. In the latest version, expandtemplates will just
return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.

> I do think that we can do better, and I pointed out possible ways to do so
> in my earlier mail:
> 
>> My preference
>> would be to let the consumer directly ask for pre-expanded wikitext *or*
>> HTML, without overloading action=expandtemplates. Even indicating the
>> content type explicitly in the API response (rather than inline with an HTML
>> tag) would be a better stop-gap as it would avoid some of the security and
>> compatibility issues described above.

I don't quite understand what you are asking for... action=parse returns HTML,
action=expandtemplates returns wikitext. The issue was with "mixed" output, that
is, representing the expandion of templates that generate HTML in wikitext. The
solution I'm going for no is to simply not expand them.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-16 Thread Gabriel Wicke
On 05/15/2014 04:42 PM, Daniel Kinzler wrote:
> The one thing that will not work on wikis with
> $wgRawHtml disabled is parsing the output of expandtemplates.

Yes, which means that it won't work with Parsoid, Flow, VE and other users.

I do think that we can do better, and I pointed out possible ways to do so
in my earlier mail:

> My preference
> would be to let the consumer directly ask for pre-expanded wikitext *or*
> HTML, without overloading action=expandtemplates. Even indicating the
> content type explicitly in the API response (rather than inline with an HTML
> tag) would be a better stop-gap as it would avoid some of the security and
> compatibility issues described above.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-16 Thread Daniel Kinzler
Hi again!

I have rewritten the patch that enabled HTML based transclusion:

https://gerrit.wikimedia.org/r/#/c/132710/

I tried to address the concerns raised about my previous attempt, namely, how
HTML based transclusion is handled in expandtemplates, and how page meta data
such as resource modules get passed from the transcluded content to the main
parser output (this should work now).

For expandtemplates, I decided to just keep HTML based transclusions as they are
- including special page transclusions. So, expandtemplates will simply leave
{{Special:Foo}} and {{MediaWiki:Foo.js}} in the expanded text, while in the xml
output, you can still see them as template calls.

Cheers,
Daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-15 Thread Daniel Kinzler
Am 14.05.2014 16:04, schrieb Gabriel Wicke:
> On 05/14/2014 03:22 PM, Daniel Kinzler wrote:
>> My patch doesn't change the handling of ... by the parser. As
>> before, the parser will pass HTML code in ... through only if
>> wgRawHtml is enabled, and will mangle/sanitize it otherwise.
> 
> 
> Oh, I thought that you wanted to support normal wikis with $wgRawHtml 
> disabled.

I want to, and I do.  is not sued for normal rendering, it is used by
expandtemplates only. During normal rendering, a strip mark is inserted, which
will work on all wikis. The one thing that will not work on wikis with
$wgRawHtml disabled is parsing the output of expandtemplates.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Dan Andreescu
>
> > Can you outline how RL modules would be handled in the transclusion
> > scenario?
>
> The current patch does not really address that problem, I'm afraid. I can
> think
> of two solutions:
>
> * Create an SyntheticHtmlContent class that would hold meta info about
> modules
> etc, just like ParserOutput - perhaps it would just contain a ParserOutput
> object.  And an equvalent SyntheticWikitextContent class, perhaps. That
> would
> allow us to pass such meta-info around as needed.
>
> * Move the entire logic for HTML based transclusion into the wikitext
> parser,
> where it can just call getParserOutput() on the respective Content object.
> We
> would then no longer need the generic infrastructure for HTML transclusion.
> Maybe that would be a better solution in the end.
>
> Hm... yes, I should make an alternative patch using that approach, so we
> can
> compare.
>

Thanks a lot Daniel, I'm happy to help test / try out any solutions you
want to experiment with.  I've moved my work to gerrit:
https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/Limnand
the last commit (with a lot of help from Matt F.) may be ready for you
to use as a use case.  Let me know if it'd be helpful to install this
somewhere in labs.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Gabriel Wicke
On 05/14/2014 03:22 PM, Daniel Kinzler wrote:
> My patch doesn't change the handling of ... by the parser. As
> before, the parser will pass HTML code in ... through only if
> wgRawHtml is enabled, and will mangle/sanitize it otherwise.


Oh, I thought that you wanted to support normal wikis with $wgRawHtml disabled.

> The content type did not change. It's wikitext.
Anything is wikitext ;)

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Daniel Kinzler
Am 14.05.2014 15:11, schrieb Gabriel Wicke:
> On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
>>> This means that HTML returned from the preprocessor needs to be valid in
>>> wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
>>> possible, but my impression is that you are shooting for something that's
>>> closer to the behavior of a tag extension. Those already bypass the
>>> sanitizer, so would be less troublesome in the short term.
>>
>> Yes. Just treat ... like a tag extension, and it should work 
>> fine.
>> Do you see any problems with that?
> 
> First of all you'll have to make sure that users cannot inject  tags
> as that would enable arbitrary XSS. I might have missed it, but I believe
> that this is not yet done in your current patch.

My patch doesn't change the handling of ... by the parser. As
before, the parser will pass HTML code in ... through only if
wgRawHtml is enabled, and will mangle/sanitize it otherwise.

My patch does mean however that the text return by expandtemplates may not
render as expected when processed by the parser. Perhaps anomie's approach of
preserving the original template call would work, something like:

  ...

Then, the parser could apply the normal expansion when encountering the tag,
ignoring the pre-rendered HTML.

> In contrast to normal tag extensions  would also contain fully
> rendered HTML, and should not be piped through action=parse as is done in
> Parsoid for tag extensions (in absence of a direct tag extension expansion
> API end point). We and other users of the expandtemplates API will have to
> add special-case handling for this pseudo tag extension.

Handling for the  tag should already be in place, since it's part of the
core spec. The issue is only to know when to allow/trust such  tags, and
when to treat them as plain text (or like a  tag).

> In HTML, the  tag is also not meant to be used inside the body of a
> page. I'd suggest using a different tag name to avoid issues with HTML
> parsers and potential name conflicts with existing tag extensions.

As above:  is part of the core syntax, to support $wgRawHtml. It's just
disabled per default.

> Overall it does not feel like a very clean way to do this. My preference
> would be to let the consumer directly ask for pre-expanded wikitext *or*
> HTML, without overloading action=expandtemplates. 

The question is how to represent non-wikitext transclusions in the output of
expandtemplates. We'll need an answer to this question in any case.

For the main purpose of my patch, expandtemplates is irrelevant. I added the
special mode that generates  specifically to have a consistent wikitext
representation for use by expandtemplates. I could simply disable it just as
well, so no expansion would apply for such templates when calling
expandtemplates (as is done for special page inclusiono).

> Even indicating the
> content type explicitly in the API response (rather than inline with an HTML
> tag) would be a better stop-gap as it would avoid some of the security and
> compatibility issues described above.

The content type did not change. It's wikitext.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Gabriel Wicke
On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
>> This means that HTML returned from the preprocessor needs to be valid in
>> wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
>> possible, but my impression is that you are shooting for something that's
>> closer to the behavior of a tag extension. Those already bypass the
>> sanitizer, so would be less troublesome in the short term.
> 
> Yes. Just treat ... like a tag extension, and it should work 
> fine.
> Do you see any problems with that?

First of all you'll have to make sure that users cannot inject  tags
as that would enable arbitrary XSS. I might have missed it, but I believe
that this is not yet done in your current patch.

In contrast to normal tag extensions  would also contain fully
rendered HTML, and should not be piped through action=parse as is done in
Parsoid for tag extensions (in absence of a direct tag extension expansion
API end point). We and other users of the expandtemplates API will have to
add special-case handling for this pseudo tag extension.

In HTML, the  tag is also not meant to be used inside the body of a
page. I'd suggest using a different tag name to avoid issues with HTML
parsers and potential name conflicts with existing tag extensions.

Overall it does not feel like a very clean way to do this. My preference
would be to let the consumer directly ask for pre-expanded wikitext *or*
HTML, without overloading action=expandtemplates. Even indicating the
content type explicitly in the API response (rather than inline with an HTML
tag) would be a better stop-gap as it would avoid some of the security and
compatibility issues described above.

>> So it is important to think of renderers as services, so that they are
>> usable from the content API and Parsoid. For existing PHP code this could
>> even be action=parse, but for new renderers without a need or desire to tie
>> themselves to MediaWiki internals I'd recommend to think of them as their
>> own service. This can also make them more attractive to third party
>> contributors from outside the MediaWiki world, as has for example recently
>> happened with Mathoid.
> 
> True, but that has little to do with my patch. It just means that 3rd party
> Content objects should preferably implement getHtml() by calling out to a
> service object.

You are right that it is not an immediate issue with your patch. The point
is about the *longer-term* role of the ContentHandler vs. the content API.
The ContentHandler could either try to be the central piece of our new
content API, or could become an integration point that normally calls out to
the content API and other services to retrieve HTML.

To me the latter is preferable as it enables us to optimize the content API
for high request rates by concentrating on doing one job well, and lets us
leverage this API from the server-side MediaWiki front-end through
ContentHandler.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Daniel Kinzler
Thanks all for the imput!

Am 14.05.2014 10:17, schrieb Gabriel Wicke:> On 05/13/2014 05:37 PM, Daniel
Kinzler wrote:
> It sounds like this won't work well with current Parsoid. We are using
> action=expandtemplates for the preprocessing of transclusions, and then
> parse the contents using Parsoid. The content is finally
> passed through the sanitizer to keep XSS at bay.
>
> This means that HTML returned from the preprocessor needs to be valid in
> wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
> possible, but my impression is that you are shooting for something that's
> closer to the behavior of a tag extension. Those already bypass the
> sanitizer, so would be less troublesome in the short term.

Yes. Just treat ... like a tag extension, and it should work fine.
Do you see any problems with that?

> So it is important to think of renderers as services, so that they are
> usable from the content API and Parsoid. For existing PHP code this could
> even be action=parse, but for new renderers without a need or desire to tie
> themselves to MediaWiki internals I'd recommend to think of them as their
> own service. This can also make them more attractive to third party
> contributors from outside the MediaWiki world, as has for example recently
> happened with Mathoid.

True, but that has little to do with my patch. It just means that 3rd party
Content objects should preferably implement getHtml() by calling out to a
service object.

Am 13.05.2014 21:38, schrieb Brad Jorsch (Anomie):
> To avoid the wikitext mangling, you could wrap it in some tag that works
> like  if $wgRawHtml is set and  otherwise.

But  will result in *escaped* HTML. That's just another kind of mangling.
It's at all the "normal" result of parsing.

Basically, the  mode is for expandtemplates only, and not intended to be
follow up by "actual" parsing.

Am 13.05.2014 21:38, schrieb Brad Jorsch (Anomie):
> Or one step further, maybe a tag html goes here
> that parses just as {{P}} does (and ignores "html goes here" entirely),
> which preserves the property that the output of expandtemplates will mostly
> work when passed back to the parser.

Hm... that's an interesting idea, I'll think about it!

Btw, just so this is mentioned somewhere: it would be very easy to simply not
expand such templates at all in expandtemplates mode, keeping them as {{T}} or
[[T]].

Am 14.05.2014 00:11, schrieb Matthew Flaschen:
> From working with Dan on this, the main issue is the ResourceLoader module 
> that the diagrams require (it uses a JavaScript library called Vega, plus a
> couple supporting libraries, and simple MW setup code).
> 
> The container element that it needs can be as simple as:
> 
> 
> 
> which is actually valid wikitext.

So, there is no server side rendering at all? It's all done using JS on the
client? Ok then, HTML transclusion isn't the solution.

> Can you outline how RL modules would be handled in the transclusion
> scenario?

The current patch does not really address that problem, I'm afraid. I can think
of two solutions:

* Create an SyntheticHtmlContent class that would hold meta info about modules
etc, just like ParserOutput - perhaps it would just contain a ParserOutput
object.  And an equvalent SyntheticWikitextContent class, perhaps. That would
allow us to pass such meta-info around as needed.

* Move the entire logic for HTML based transclusion into the wikitext parser,
where it can just call getParserOutput() on the respective Content object. We
would then no longer need the generic infrastructure for HTML transclusion.
Maybe that would be a better solution in the end.

Hm... yes, I should make an alternative patch using that approach, so we can
compare.


Thanks for your input!
-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Gabriel Wicke
On 05/13/2014 05:37 PM, Daniel Kinzler wrote:
> Hi all!
> 
> During the hackathon, I worked on a patch that would make it possible for
> non-textual content to be included on wikitext pages using the template 
> syntax.
> The idea is that if we have a content handler that e.g. generates awesome
> diagrams from JSON data, like the extension Dan Andreescu wrote, we want to be
> able to use that output on a wiki page. But until now, that would have 
> required
> the content handler to generate wikitext for the transclusion - not easily 
> done.


It sounds like this won't work well with current Parsoid. We are using
action=expandtemplates for the preprocessing of transclusions, and then
parse the contents using Parsoid. The content is finally
passed through the sanitizer to keep XSS at bay.

This means that HTML returned from the preprocessor needs to be valid in
wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
possible, but my impression is that you are shooting for something that's
closer to the behavior of a tag extension. Those already bypass the
sanitizer, so would be less troublesome in the short term. We currently also
can't process transclusions independently to HTML, as we still have to
support unbalanced templates. We are moving into that direction though,
which should also make it easier to support non-wikitext transclusion content.

In the longer team, Parsoid will request pre-sanitized and balanced HTML
from the content API [1,2] for everything but unbalanced wikitext content
[3]. The content API will treat it like any other request, and ask the
storage service for the HTML. If that's found, then it is directly returned
and no rendering happens. This is going to be the typical and fast case. If
there is however no HTML in storage for that revision the content API will
just call the renderer service and save the HTML back / return it to clients
like Parsoid.

So it is important to think of renderers as services, so that they are
usable from the content API and Parsoid. For existing PHP code this could
even be action=parse, but for new renderers without a need or desire to tie
themselves to MediaWiki internals I'd recommend to think of them as their
own service. This can also make them more attractive to third party
contributors from outside the MediaWiki world, as has for example recently
happened with Mathoid.

Gabriel

[1]: https://www.mediawiki.org/wiki/Requests_for_comment/Content_API
[2]: https://github.com/gwicke/restface
[3]: We are currently mentoring a GSoC project to collect statistics on
issues like unbalanced templates, which should allow us to systematically
mark those transclusions by wrapping them in a  tag in wikitext.
All transclusions outside of  will then be expected to yield
stand-alone HTML.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-13 Thread Matthew Flaschen

On 05/13/2014 11:37 AM, Daniel Kinzler wrote:

Hi all!

During the hackathon, I worked on a patch that would make it possible for
non-textual content to be included on wikitext pages using the template syntax.
The idea is that if we have a content handler that e.g. generates awesome
diagrams from JSON data, like the extension Dan Andreescu wrote, we want to be
able to use that output on a wiki page. But until now, that would have required
the content handler to generate wikitext for the transclusion - not easily done.


From working with Dan on this, the main issue is the ResourceLoader 
module that the diagrams require (it uses a JavaScript library called 
Vega, plus a couple supporting libraries, and simple MW setup code).


The container element that it needs can be as simple as:



which is actually valid wikitext.

Can you outline how RL modules would be handled in the transclusion 
scenario?


Matt Flaschen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-13 Thread Brad Jorsch (Anomie)
On Tue, May 13, 2014 at 11:37 AM, Daniel Kinzler wrote:

> As Brion pointed out in a comment to my original, there is another caveat:
> what
> should the expandtemplates module do when expanding non-wikitext
> templates? I
> decided to just wrap the HTML in ... tags instead of using a
> strip
> mark in this case. The resulting wikitext is however only "correct" if
> $wgRawHtml is enabled, otherwise, the HTML will get mangled/escaped by
> wikitext
> parsing. This seems acceptable to me, but please let me know if you have a
> better idea.
>

Just brainstorming:

To avoid the wikitext mangling, you could wrap it in some tag that works
like  if $wgRawHtml is set and  otherwise.

Or one step further, maybe a tag html goes here
that parses just as {{P}} does (and ignores "html goes here" entirely),
which preserves the property that the output of expandtemplates will mostly
work when passed back to the parser.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-13 Thread Daniel Kinzler
Hi all!

During the hackathon, I worked on a patch that would make it possible for
non-textual content to be included on wikitext pages using the template syntax.
The idea is that if we have a content handler that e.g. generates awesome
diagrams from JSON data, like the extension Dan Andreescu wrote, we want to be
able to use that output on a wiki page. But until now, that would have required
the content handler to generate wikitext for the transclusion - not easily done.

So, I came up with a way for ContentHandler to wrap the HTML generated by
another ContentHandler so it can be used for transclusion.

Have a look at the patch at . Note
that I have completely rewritten it since my first version at the hackathon.

It would be great to get some feedback on this, and have it merged soon, so we
can start using non-textual content to its full potential.

Here is a quick overview of the information flow. Let's assume we have a
"template" page T that is supposed to be transcluded on a "target" page P; the
template page uses the non-text content model X, while the target page is
wikitext. So:

* When Parser parses P, it encounters {{T}}
* Parser loads the Content object for T (an XContent object, for model X), and
calls getTextForTransclusion() on it, with CONTENT_MODEL_WIKITEXT as the target
format.
* getTextForTransclusion() calls getContentForTransclusion()
* getContentForTransclusion() calls convert( CONTENT_MODEL_WIKITEXT ) which
fails (because content model X doesn't provide a wikitext representation).
* getContentForTransclusion() then calls convertContentViaHtml()
* convertContentViaHtml() calls getTextForTransclusion( CONTENT_MODEL_HTML ) to
get the HTML representation.
* getTextForTransclusion() calls getContentForTransclusion() calls convert()
which handles the conversion to HTML by calling getHtml() directly.
* convertContentViaHtml() takes the HTML and calls makeContentFromHtml() on the
ContentHandler for wikitext.
* makeContentFromHtml() replaces the actual HTML by a parser strip mark, and
returns a WikitextContent containing this strip mark.
* The strip mark is eventually returns to the original Parser instances, and
used to replace {{T}} on the original page.

This essentialyl means that any content can be converted to HTML, and can be
transcluded into any content that provides an implementation of
makeContentFromHtml(). This actually changes how transclusion of JS and CSS
pages into wikitext pages work. You can try this out by transclusing a JS page
like MediaWiki:Test.js as a template on a wikitext page.


The old getWikitextForTransclusion() is now a shorthand for
getTextForTransclusion( CONTENT_MODEL_WIKITEXT ).


As Brion pointed out in a comment to my original, there is another caveat: what
should the expandtemplates module do when expanding non-wikitext templates? I
decided to just wrap the HTML in ... tags instead of using a strip
mark in this case. The resulting wikitext is however only "correct" if
$wgRawHtml is enabled, otherwise, the HTML will get mangled/escaped by wikitext
parsing. This seems acceptable to me, but please let me know if you have a
better idea.


So, let me know what you think!
Daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l