Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-20 Thread Daniel Kinzler
Am 19.05.2014 23:05, schrieb Gabriel Wicke:
 I think we have agreement that some kind of tag is still needed. The main
 point still under discussion is on which tag to use, and how to implement
 this tag in the parser.

Indeed.

 Originally, domparse was conceived to be used in actual page content to
 wrap wikitext that is supposed to be parsed to a balanced DOM *as a unit*
 rather than transclusion by transclusion. Once unbalanced compound
 transclusion content is wrapped in domparse tags (manually or via bots
 using Parsoid info), we can start to enforce nesting of all other
 transclusions by default. This will make editing safer and more accurate,
 and improve performance by letting us reuse expansions and avoid
 re-rendering the entire page during refreshLinks. See
 https://bugzilla.wikimedia.org/show_bug.cgi?id=55524 for more background.


Ah, I though you just pulled that out of your hat :)

My main reason for recycling the html tag was to not introduce a new tag
extension. domparse may occur verbatim in existing wikitext, and would break
when the tag is introduces.

Other than that, I'm find with outputting whatever tag you like for the
transclusion. Implementing the tag is something else, though - I could implement
it so it will work for HTML transclusion, but I'm not sure I understand the
original domparse stuff well enough to get that right. Would domparse be in
core, btw?


 Now back to the syntax. Encoding complex transclusions in a HTML parameter
 would be rather cumbersome, and would entail a lot of attribute-specific
 escaping.

Why would it involve any escaping? It should be handled as a tag extension, like
any other.

 $wgRawHtml is disabled in all wikis we are currently interested in.
 MediaWiki does properly report the html extension tag from siteinfo when
 $wgRawHtml is enabled, so it ought to work with Parsoid for private wikis.
 It will be harder to support the html
 transclusion=transclusions/html exception.

I should try what expandtemplates does with html with $wgRawHtml enabled.
Nothing, probably. It will just come back containing raw HTML. Which would be
fine, I think.

By the way: once we agree on a mechanism, it would be trivial to use the same
mechanism for special page transclusion. My patch actually already covers that.
Do you agree that this is the Right Thing? It's just transclusion of HTML
content, after all.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-20 Thread Gabriel Wicke
On 05/20/2014 02:46 AM, Daniel Kinzler wrote:
 My main reason for recycling the html tag was to not introduce a new tag
 extension. domparse may occur verbatim in existing wikitext, and would break
 when the tag is introduces.

The only existing mentions of this are probably us discussing it ;) In any
case, it's easy to grep for it  nowikify existing uses.

 Other than that, I'm find with outputting whatever tag you like for the
 transclusion. 

Great!

 Implementing the tag is something else, though - I could implement
 it so it will work for HTML transclusion, but I'm not sure I understand the
 original domparse stuff well enough to get that right. Would domparse be in
 core, btw?

Yes, it should be in core. I believe that a very simple implementation
(without actual DOM balancing, using Parser::recursiveTagParse()) would not
be too hard. The guts of it are described in [1]. The limitations of
recursiveTagParse should not matter much for this use case.

 Now back to the syntax. Encoding complex transclusions in a HTML parameter
 would be rather cumbersome, and would entail a lot of attribute-specific
 escaping.
 
 Why would it involve any escaping? It should be handled as a tag extension, 
 like
 any other.

Transclusions can contain quotes, which need to be escaped in attribute
values to make sure that the attribute is in fact an attribute. Since quotes
tend to be more common than domparse tags this means that there's going to
be more escaping. I also find it harder to scan for quotes ending a long
attribute value. Tags are easier to spot.

 $wgRawHtml is disabled in all wikis we are currently interested in.
 MediaWiki does properly report the html extension tag from siteinfo when
 $wgRawHtml is enabled, so it ought to work with Parsoid for private wikis.
 It will be harder to support the html
 transclusion=transclusions/html exception.
 
 I should try what expandtemplates does with html with $wgRawHtml enabled.
 Nothing, probably. It will just come back containing raw HTML. Which would be
 fine, I think.

Yes, that case will work. But $wgRawHtml enabled is the exception, and not
something I'd like to encourage.

 By the way: once we agree on a mechanism, it would be trivial to use the same
 mechanism for special page transclusion. My patch actually already covers 
 that.
 Do you agree that this is the Right Thing? It's just transclusion of HTML
 content, after all.

Yes, that sounds good to me.

Gabriel

[1]:
https://www.mediawiki.org/wiki/Manual:Tag_extensions#How_do_I_render_wikitext_in_my_extension.3F

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Daniel Kinzler
I'm getting the impression there is a fundamental misunderstanding here.

Am 18.05.2014 04:28, schrieb Subramanya Sastry:
 So, consider this wikitext for page P.
 
 == Foo ==
 {{wikitext-transclusion}}
   *a1
 map .. ... /map
   *a2
 {{T}} (the html-content-model-transclusion)
   *a3
 
 Parsoid gets wikitext from the API for {{wikitext-transclusion}}, parses it 
 and
 injects the tokens into the P's content. Parsoid gets HTML from the API for
 map./map and injects the HTML into the not-fully-processed wikitext 
 of P
 (by adding an appropriate token wrapper). So, if {{T}} returns HTML (i.e. the 
 MW
 API lets Parsoid know that it is HTML), Parsoid can inject the HTML into the
 not-fully-processed wikitext and ensure that the final output comes out right
 (in this case, the HTML from both the map extension and {{T}} would not get
 sanitized as it should be).
 
 Does that help explain why we said we don't need the html wrapper?

No, it actually misses my point completely. My point is that this may work with
the way parsoid uses expandtemplates, but it does not work for expandtemplates
in general. Because expandtemplates takes full wikitext as input, and only
partially replaces it.

So, let me phrase it this way:

If expandtemplates is called with text=

   == Foo ==
   {{T}}

   [[Category:Bla]]

What should it return, and what content type should be declared in the http 
header?

Note that I'm not talking about how parsoid processes this text. That's not my
point - my point is that expandtemplates can be and is used on full wikitext. In
that context, the return type cannot be HTML.

 All that said, if you want to provide the wrapper with html model=whatever
 fully-expanded-HTML/html, we can handle that as well. We'll use the 
 model
 attribute of the wrapper, discard the wrapper and use the contents in our 
 pipeline.

Why use the model attribute? Why would you care about the original model? All
you need to know is that you'll get HTML. Exposing the original model in this
context seems useless if not misleading. html transclude={{T}}/html would
give that backend parser a way to discard the HTML (as unsafe) and execute the
transclusion instead (generating trusted HTML). In fact, we could just omit the
content of the html tag.

 So, model information either as an attribute on the wrapper, api response
 header, or a property in the JSON/XML response structure would all work for 
 us.

As explained above, the return type cannot be HTML for the full text, because
any plain wikitext would stay unprocessed. There needs to be a marker for
html transclusion *here* in the text.

Am 18.05.2014 16:29, schrieb Gabriel Wicke:
 The difference between wrapper and property is actually that using inline
 wrappers in the returned wikitext would force us to escape similar wrappers
 from normal template content to avoid opening a gaping XSS hole.

Please explain, I do not see the hole you mention.

If the input contained htmlevil stuff/html, it would just get escaped by the
preprocessor (unless $wgRawHtml is enabled), as it is now:
https://de.wikipedia.org/w/api.php?action=expandtemplatestext=%3Chtml%3E%3Cscript%3Ealert%28%27evil%27%29%3C/script%3E%3C/html%3E

If html transclude={{T}} was passed, the parser/preprocessor would treat it
like it would treat {{T}} - it would get trusted, backend generated HTML from
respective Content object.

I see no change, and no opportunity to inject anything. Am I missing something?

 A separate property in the JSON/XML structure avoids the need for escaping
 (and associated security risks if not done thoroughly), and should be
 relatively straightforward to implement and consume.

As explained above, I do not see how this would work except for the very special
case of using expandtemplates to expand just a single template. This could be
solved by introducing a new, single template mode for expandtemplates, e.g.
using expand=Foo|x|y|z instead of text={{Foo|x|y|z}}.

Another way would be to use hints the structure returned by generatexml. There,
we have an opportunity to declare a content type for a *part* of the output (or
rather, input).

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Subramanya Sastry

On 05/19/2014 04:52 AM, Daniel Kinzler wrote:

I'm getting the impression there is a fundamental misunderstanding here.


You are correct. I completely misunderstood what you said in your last 
response about expandtemplates. So, the rest of my response to your last 
email is irrelevant ... and let me reboot :-).



All that said, if you want to provide the wrapper with html model=whatever
fully-expanded-HTML/html, we can handle that as well. We'll use the model
attribute of the wrapper, discard the wrapper and use the contents in our 
pipeline.

Why use the model attribute? Why would you care about the original model? All
you need to know is that you'll get HTML. Exposing the original model in this
context seems useless if not misleading.
Given that I misunderstood your larger observation about 
expandtemplates, this is not relevant now. But, I was basing this on 
your proposal from the previous email which I'll now go back to.


On 05/17/2014 06:14 PM, Daniel Kinzler wrote:
I think something like html transclusion={{T}} 
model=whatever.../html would work best.


I see what you are getting at here. Parsoid can treat this like a 
regular tag-extension and send it back to the api=parse endpoint for 
processing. Except if you provided the full expansion as the content of 
the html-wrapper in which case the extra api call can be skipped. The 
extra api call is not really an issue for occasional uses, but on pages 
with a lot of non-wikitext transclusion uses, this is an extra api call 
for each such use. I don't have a sense for how common this would be, so 
maybe that is a premature worry.


That said, for other clients, this content would be deadweight (if they 
are going to discard it and go back to the api=parse endpoint anyway or 
worse send back the entire response to the parser that is going to just 
discard it after the network transfer).


So, looks like there are some conflicting perf. requirements for 
different clients wrt expandtemplates response here. In that context, at 
least from a solely parsoid-centric point of view, the new api endpoint 
'expand=Foo|x|y|z' you proposed would work well as well.


Subbu.


A separate property in the JSON/XML structure avoids the need for escaping
(and associated security risks if not done thoroughly), and should be
relatively straightforward to implement and consume.

As explained above, I do not see how this would work except for the very special
case of using expandtemplates to expand just a single template. This could be
solved by introducing a new, single template mode for expandtemplates, e.g.
using expand=Foo|x|y|z instead of text={{Foo|x|y|z}}.

Another way would be to use hints the structure returned by generatexml. There,
we have an opportunity to declare a content type for a *part* of the output (or
rather, input).


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Daniel Kinzler
Am 19.05.2014 14:21, schrieb Subramanya Sastry:
 On 05/19/2014 04:52 AM, Daniel Kinzler wrote:
 I'm getting the impression there is a fundamental misunderstanding here.
 
 You are correct. I completely misunderstood what you said in your last 
 response
 about expandtemplates. So, the rest of my response to your last email is
 irrelevant ... and let me reboot :-).

Glad we got that out of the way :)

 On 05/17/2014 06:14 PM, Daniel Kinzler wrote:
 I think something like html transclusion={{T}} model=whatever.../html
 would work best.
 
 I see what you are getting at here. Parsoid can treat this like a regular
 tag-extension and send it back to the api=parse endpoint for processing.
 Except
 if you provided the full expansion as the content of the html-wrapper in which
 case the extra api call can be skipped. The extra api call is not really an
 issue for occasional uses, but on pages with a lot of non-wikitext 
 transclusion
 uses, this is an extra api call for each such use. I don't have a sense for 
 how
 common this would be, so maybe that is a premature worry.

I would probably go for always including the expanded HTML for now.

 That said, for other clients, this content would be deadweight (if they are
 going to discard it and go back to the api=parse endpoint anyway or worse send
 back the entire response to the parser that is going to just discard it after
 the network transfer).

Yes. There could be an option to omit it. That makes the implementation more
complex, but it's doable.

 So, looks like there are some conflicting perf. requirements for different
 clients wrt expandtemplates response here. In that context, at least from a
 solely parsoid-centric point of view, the new api endpoint 'expand=Foo|x|y|z'
 you proposed would work well as well.

That seems the cleanest solution for the parsoid use case - however, the
implementation is complicated by how parameter substitution works. For HTML
based transclusion, it doesn't work at all at the moment - we would need tighter
integration with the preprocessor for doing that.

Basically, there would be two cases: convert expand=Foo|x|y|z to {{Foo|x|y|z}}
internally an call Parser::preprocess on that, so parameter subsitution is done
correctly; or get the HTML from Foo, and discard the parameters. We would have
to somehow know in advance which mode to use, handle the appropriate case, and
then set the Content-Type header accordingly. Pretty messy...

I think html transclusion={{T}} is the simplest and most robust solution for
now.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 09:52 AM, Daniel Kinzler wrote:
 Am 18.05.2014 16:29, schrieb Gabriel Wicke:
 The difference between wrapper and property is actually that using inline
 wrappers in the returned wikitext would force us to escape similar wrappers
 from normal template content to avoid opening a gaping XSS hole.
 
 Please explain, I do not see the hole you mention.
 
 If the input contained htmlevil stuff/html, it would just get escaped by 
 the
 preprocessor (unless $wgRawHtml is enabled), as it is now:
 https://de.wikipedia.org/w/api.php?action=expandtemplatestext=%3Chtml%3E%3Cscript%3Ealert%28%27evil%27%29%3C/script%3E%3C/html%3E

What you see there is just unescaped HTML embedded in the XML result format.
It's clearer that there's in fact no escaping on the HTML when looking at
the JSON:

https://de.wikipedia.org/w/api.php?action=expandtemplatestext=%3Chtml%3E%3Cscript%3Ealert%28%27evil%27%29%3C/script%3E%3C/html%3Eformat=json

Parsoid depends on there being no escaping for unknown tags (and known
extension tags) in the preprocessor.

So if you use tags, you'll have to add escaping for those.

The move to HTML-based (self-contained) transclusions expansions will avoid
this issue completely. That's a few months out though. Maybe we can find a
stop-gap solution that moves in that direction, without introducing special
tags in expandtemplates that we'll have to support for a long time.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 04:54 PM, Gabriel Wicke wrote:
 The move to HTML-based (self-contained) transclusions expansions will avoid
 this issue completely. That's a few months out though. Maybe we can find a
 stop-gap solution that moves in that direction, without introducing special
 tags in expandtemplates that we'll have to support for a long time.

Here's a proposal:

* Introduce a domparse extension tag that causes its content to be parsed
all the way to a self-contained DOM structure. Example:
domparse{{T}}/domparse

* Emit this tag for HTML page transclusions. Avoids the security issue as
there's no way to inject verbatim HTML. Works with Parsoid out of the box.

* Use domparse to support parsing unbalanced templates by inserting it
into wikitext:
domparse
{{table-start}}
{{table-row}}
{{table-end}}
/domparse

* Build a solid HTML-only expansion API end point, and start using that for
all transclusions that are not wrapped in domparse

* Stop wrapping non-wikitext transclusions into domparse in
action=expandtemplates once those can be directly expanded to a
self-contained DOM.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 10:19 AM, Gabriel Wicke wrote:
 On 05/19/2014 04:54 PM, Gabriel Wicke wrote:
 The move to HTML-based (self-contained) transclusions expansions will avoid
 this issue completely. That's a few months out though. Maybe we can find a
 stop-gap solution that moves in that direction, without introducing special
 tags in expandtemplates that we'll have to support for a long time.
 
 Here's a proposal:
 
 * Introduce a domparse extension tag that causes its content to be parsed
 all the way to a self-contained DOM structure. Example:
 domparse{{T}}/domparse
 
 * Emit this tag for HTML page transclusions. Avoids the security issue as
 there's no way to inject verbatim HTML. Works with Parsoid out of the box.
 
 * Use domparse to support parsing unbalanced templates by inserting it
 into wikitext:
 domparse
 {{table-start}}
 {{table-row}}
 {{table-end}}
 /domparse
 
 * Build a solid HTML-only expansion API end point, and start using that for
 all transclusions that are not wrapped in domparse
 
 * Stop wrapping non-wikitext transclusions into domparse in
 action=expandtemplates once those can be directly expanded to a
 self-contained DOM.

Here's a possible division of labor:

You (Daniel) could start with the second step (emitting the tag). Since not
much escaping is needed (only nested domparse tags in the transclusion)
this should be fairly straightforward.

We could work on the extension implementation (first bullet point) together,
or tackle it completely on the Parsoid side. We planned to work on this in
any case as part of our longer-term migration to well-balanced HTML
transclusions.

The advantage of using domparse to support both unbalanced templates 
special transclusions is that we'll only have to implement this once, and
won't introduce another tag only to deprecate it fairly quickly. Phasing out
unbalanced templates will take longer, as we'll first have to come up with
alternative means to support the same use cases.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 10:55 AM, Bartosz Dziewoński wrote:
 I am kind of lost in this discussion, but let me just ask one question.
 
 Won't all of the proposed solutions, other than the one of just not
 expanding transclusions that can't be expanded to wikitext, break the
 original and primary purpose of ExpandTemplates: providing valid parsable
 wikitext, for understanding by humans and for pasting back into articles in
 order to bypass transclusion limits?

Yup. But that's the case with domparse, while it's not the case with
html unless $wgRawHtml is true (which is impossible for publicly-editable
wikis).

 I feel that Parsoid should be using a separate API for whatever it's doing
 with the wikitext. I'm sure that would give you more flexibility with
 internal design as well.

We are moving towards that, but will still need to support unbalanced
transclusions for a while. Since special transclusions can be nested inside
of those we will need some form of inline support even if we expand most
transclusions all the way to DOM with a different end point. Also, as Daniel
pointed out, most other users are using action=expandtemplates for entire
pages and expect that to work as well.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Bartosz Dziewoński

I am kind of lost in this discussion, but let me just ask one question.

Won't all of the proposed solutions, other than the one of just not expanding 
transclusions that can't be expanded to wikitext, break the original and 
primary purpose of ExpandTemplates: providing valid parsable wikitext, for 
understanding by humans and for pasting back into articles in order to bypass 
transclusion limits?

I feel that Parsoid should be using a separate API for whatever it's doing with 
the wikitext. I'm sure that would give you more flexibility with internal 
design as well.

--
Matma Rex

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Daniel Kinzler
Am 19.05.2014 20:01, schrieb Gabriel Wicke:
 On 05/19/2014 10:55 AM, Bartosz Dziewoński wrote:
 I am kind of lost in this discussion, but let me just ask one question.

 Won't all of the proposed solutions, other than the one of just not
 expanding transclusions that can't be expanded to wikitext, break the
 original and primary purpose of ExpandTemplates: providing valid parsable
 wikitext, for understanding by humans and for pasting back into articles in
 order to bypass transclusion limits?
 
 Yup. But that's the case with domparse, while it's not the case with
 html unless $wgRawHtml is true (which is impossible for publicly-editable
 wikis).

html transclusion={{T}} would work transparently. It would contain HTML, for
direct use by the client, and could be passed back to the parser, which would
ignore the HTML and execute the transclusion. It should be 100% compatible with
existing clients (unless the look for verbatim html for some reason).

I'll have to re-read Gabriel's domparse proposal tomorrow - right now, I don't
see why it would be necessary, or how it would improve the situation.

 I feel that Parsoid should be using a separate API for whatever it's doing
 with the wikitext. I'm sure that would give you more flexibility with
 internal design as well.
 
 We are moving towards that, but will still need to support unbalanced
 transclusions for a while.

But for HTML based transclusions you could ignore that - you could already
resolve these using a separate API call, if needed.

But still - I do not see why that would be necessary. If expandtemplates returns
html transclusion={{T}}, clients can pass that back to the parser safely, or
use the contained HTML directly, safely.

Parsoid would keep working as before: it would treat html as a tag extension
(it does that, right?) and pass it back to the parser (which would expand it
again, this time fully, if action=parse is used). If parsoid knows about the
special properties of html, it could just use the contents verbatim - I see no
reason why that would be any more unsafe as any other HTML returned by the 
parser.

But perhaps I'm missing something obvious. I'll re-read the proposal tomorrow.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Gabriel Wicke
On 05/19/2014 12:46 PM, Daniel Kinzler wrote:
 Am 19.05.2014 20:01, schrieb Gabriel Wicke:
 On 05/19/2014 10:55 AM, Bartosz Dziewoński wrote:
 I am kind of lost in this discussion, but let me just ask one question.

 Won't all of the proposed solutions, other than the one of just not
 expanding transclusions that can't be expanded to wikitext, break the
 original and primary purpose of ExpandTemplates: providing valid parsable
 wikitext, for understanding by humans and for pasting back into articles in
 order to bypass transclusion limits?
 
 Yup. But that's the case with domparse, while it's not the case with
 html unless $wgRawHtml is true (which is impossible for publicly-editable
 wikis).
 
 html transclusion={{T}} would work transparently. It would contain HTML, 
 for
 direct use by the client, and could be passed back to the parser, which would
 ignore the HTML and execute the transclusion. It should be 100% compatible 
 with
 existing clients (unless the look for verbatim html for some reason).

Currently html tags are escaped when $wgRawHtml is disabled. We could
change the implementation to stop doing so *iff* the transclusion parameter
is supplied, but IMO that would be fairly unexpected and inconsistent behavior.

 I feel that Parsoid should be using a separate API for whatever it's doing
 with the wikitext. I'm sure that would give you more flexibility with
 internal design as well.
 
 We are moving towards that, but will still need to support unbalanced
 transclusions for a while.
 
 But for HTML based transclusions you could ignore that - you could already
 resolve these using a separate API call, if needed.

Yes, and they are going to be the common case once we have marked up the
exceptions with tags like domparse. As you correctly pointed out, inline
tags are primarily needed for expandtemplates calls on compound content,
which we need to do as long as we support unbalanced templates. We can't
know a priori whether some transclusions in turn transclude special HTML
content.

I think we have agreement that some kind of tag is still needed. The main
point still under discussion is on which tag to use, and how to implement
this tag in the parser.

Originally, domparse was conceived to be used in actual page content to
wrap wikitext that is supposed to be parsed to a balanced DOM *as a unit*
rather than transclusion by transclusion. Once unbalanced compound
transclusion content is wrapped in domparse tags (manually or via bots
using Parsoid info), we can start to enforce nesting of all other
transclusions by default. This will make editing safer and more accurate,
and improve performance by letting us reuse expansions and avoid
re-rendering the entire page during refreshLinks. See
https://bugzilla.wikimedia.org/show_bug.cgi?id=55524 for more background.

The use of domparse to mark up special HTML transclusions in
expandtemplates output will be temporary (until HTML transclusions are the
default), but even if such output is pasted into the actual wikitext it
would be harmless, and would work as expected.

Now back to the syntax. Encoding complex transclusions in a HTML parameter
would be rather cumbersome, and would entail a lot of attribute-specific
escaping. Wrapping such transclusions in domparse tags on the other hand
normally does not entail any escaping, as only nested domparse tags are
problematic.

 Parsoid would keep working as before: it would treat html as a tag extension
 (it does that, right?)

$wgRawHtml is disabled in all wikis we are currently interested in.
MediaWiki does properly report the html extension tag from siteinfo when
$wgRawHtml is enabled, so it ought to work with Parsoid for private wikis.
It will be harder to support the html
transclusion=transclusions/html exception.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-18 Thread Gabriel Wicke
On 05/18/2014 02:28 AM, Subramanya Sastry wrote:
 However, in his previous message, Gabriel indicated that
 a property in the JSON/XML response structure might work better for
 multi-part responses.

The difference between wrapper and property is actually that using inline
wrappers in the returned wikitext would force us to escape similar wrappers
from normal template content to avoid opening a gaping XSS hole.

A separate property in the JSON/XML structure avoids the need for escaping
(and associated security risks if not done thoroughly), and should be
relatively straightforward to implement and consume.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Daniel Kinzler
Am 16.05.2014 21:07, schrieb Gabriel Wicke:
 On 05/15/2014 04:42 PM, Daniel Kinzler wrote:
 The one thing that will not work on wikis with
 $wgRawHtml disabled is parsing the output of expandtemplates.
 
 Yes, which means that it won't work with Parsoid, Flow, VE and other users.

And it has been fixed now. In the latest version, expandtemplates will just
return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.

 I do think that we can do better, and I pointed out possible ways to do so
 in my earlier mail:
 
 My preference
 would be to let the consumer directly ask for pre-expanded wikitext *or*
 HTML, without overloading action=expandtemplates. Even indicating the
 content type explicitly in the API response (rather than inline with an HTML
 tag) would be a better stop-gap as it would avoid some of the security and
 compatibility issues described above.

I don't quite understand what you are asking for... action=parse returns HTML,
action=expandtemplates returns wikitext. The issue was with mixed output, that
is, representing the expandion of templates that generate HTML in wikitext. The
solution I'm going for no is to simply not expand them.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry
(Top posting to quickly summarize what I gathered from the discussion 
and what would be required for Parsoid to expand pages with these 
transclusions).


Parsoid currently relies on the mediawiki API to preprocess 
transclusions and return wikitext (uses action=expandtemplates for this) 
which it then parses using native Parsoid pipeline.  Parsoid processes 
extension tags via action=parse and weaves the result back into the 
top-level content of the page.


As per your original email, I am assuming the T is a page with a special 
content model that generates HTML and another page P has a transclusion 
{{T}}.


So, when Parsoid encounters {{T}}, it should be able to replace {{T}} 
with the HTML to generate the right parse output for P.


So, I am listing below 4 possible ways action=expandtemplates can 
process {{T}}


1. Your newest implementation (that just returns back {{T}}):

* If Parsoid gets back {{T}}, one of two things can happen:
--- Parsoid, as usual, tries to parse it as wikitext, and it gets stuck 
in an infinite loop (query MW api for expansion of {{T}}, get back 
{{T}}, parse it as {{T}}, query MW api for expansion of {{T}},  ). 
So, this will definitely not work.
--- Parsoid adds a special case check to see if the API sent back {{T}}, 
and in which case, requires a different API endpoint 
(action=expandtohtml maybe?) to send back the html expansion based on 
the assumption about output of expandtemplates. This would work and 
would require the new endpoint to be implemented, but feels hacky.


So, going back to your original implementation, here are at least 3 ways 
I see this working:


2. action=expandtemplates returns a html.../html for the expansion 
of {{T}}, but also provides an additional API response header that tells 
Parsoid that T was a special content model page and that the raw HTML 
that it received should not be sanitized.


3. action=expandtemplates returns html.../html for the expansion of 
{{T}} and no other indication about T being a special content model page 
or not. However, if Parsoid (and other clients) are to trust these html 
output always without sanitization, expandtemplates implementation 
should have a conditional sanitization of html tags encountered in 
wikitext to prevent XSS. As far as I understand, expandtemplates (on 
master, not your patch) does not do this tag sanitization. But, 
independent of that, what Parsoid and clients need is a guarantee that 
it is safe to blindly splice the contents of any html.../html it 
receives for any {{T}} no matter whether what content model T implements.


4. Parsoid first queries the MW-api to find out the content model of T 
for every transclusion {{T}} it encounters on the page P and based on 
the content-model info, knows how to process the output of 
action=expandtemplates.


Clearly 4. is expensive and 3. seems hacky, but if it can be made to 
work, we can work with that.


But, both Gabriel and I think that solution 2. is the cleanest solution 
for now that would work. The PHP parser (in your patch to handle {{T}}) 
already has information about the content model of T when it is 
expanding {{T}} and it seems simplest and cleanest to return this 
information back to clients in the non-default content content-model 
expansions. That gives clients like Parsoid the cleanest way of handling 
these.


If I am missing something or this is unclear, and this getting into too 
much back and forth on email and it is simpler to discuss this on IRC, I 
can hop onto any IRC channel on Monday or we can do this on 
#mediawiki-parsoid, and one of us could later summarize the discussion 
back onto this thread.


Thanks,
Subbu.


On 05/17/2014 02:54 AM, Daniel Kinzler wrote:

Am 16.05.2014 21:07, schrieb Gabriel Wicke:

On 05/15/2014 04:42 PM, Daniel Kinzler wrote:

The one thing that will not work on wikis with
$wgRawHtml disabled is parsing the output of expandtemplates.

Yes, which means that it won't work with Parsoid, Flow, VE and other users.

And it has been fixed now. In the latest version, expandtemplates will just
return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.


I do think that we can do better, and I pointed out possible ways to do so
in my earlier mail:


My preference
would be to let the consumer directly ask for pre-expanded wikitext *or*
HTML, without overloading action=expandtemplates. Even indicating the
content type explicitly in the API response (rather than inline with an HTML
tag) would be a better stop-gap as it would avoid some of the security and
compatibility issues described above.

I don't quite understand what you are asking for... action=parse returns HTML,
action=expandtemplates returns wikitext. The issue was with mixed output, that
is, representing the expandion of templates that generate HTML in wikitext. The
solution I'm going for no is to simply not expand them.

-- daniel





___
Wikitech-l mailing list

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry

On 05/17/2014 10:51 AM, Subramanya Sastry wrote:
So, going back to your original implementation, here are at least 3 
ways I see this working:


2. action=expandtemplates returns a html.../html for the expansion 
of {{T}}, but also provides an additional API response header that 
tells Parsoid that T was a special content model page and that the raw 
HTML that it received should not be sanitized.


Actually, the html/html wrapper is not even required here since the 
new API response header (for example, X-Content-Model: HTML) is 
sufficient to know what to do with the response body.


Subbu.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Gabriel Wicke
On 05/17/2014 05:57 PM, Subramanya Sastry wrote:
 On 05/17/2014 10:51 AM, Subramanya Sastry wrote:
 So, going back to your original implementation, here are at least 3 ways I
 see this working:

 2. action=expandtemplates returns a html.../html for the expansion of
 {{T}}, but also provides an additional API response header that tells
 Parsoid that T was a special content model page and that the raw HTML that
 it received should not be sanitized.
 
 Actually, the html/html wrapper is not even required here since the new
 API response header (for example, X-Content-Model: HTML) is sufficient to
 know what to do with the response body.

Indeed.

Also, instead of the header we can just set a property / attribute in the
JSON/XML response structure. This will also work for multi-part responses,
for example when calling action=expandtemplates on multiple titles.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Daniel Kinzler
Am 17.05.2014 17:57, schrieb Subramanya Sastry:
 On 05/17/2014 10:51 AM, Subramanya Sastry wrote:
 So, going back to your original implementation, here are at least 3 ways I 
 see
 this working:

 2. action=expandtemplates returns a html.../html for the expansion of
 {{T}}, but also provides an additional API response header that tells Parsoid
 that T was a special content model page and that the raw HTML that it 
 received
 should not be sanitized.
 
 Actually, the html/html wrapper is not even required here since the new 
 API
 response header (for example, X-Content-Model: HTML) is sufficient to know 
 what
 to do with the response body.

 But that would only work if {{T}} was the whole text that was being expanded (I
guess that's what you do with parsoid, right? Took me a minute to realize that).
expandtemplates operates on full wikitext. If the input is something like

  == Foo ==
  {{T}}

  [[Category:Bla}}

Then expanding {{T}} without a wrapper and pretending the result was HTML would
just be wrong.

Regarding trusting the output: MediaWiki core trusts the generated HTML for
direct output. It's no different from the HTML generated by e.g. special pages
in that regard.

I think something like html transclusion={{T}} model=whatever.../html
would work best.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry

On 05/17/2014 06:14 PM, Daniel Kinzler wrote:

Am 17.05.2014 17:57, schrieb Subramanya Sastry:

On 05/17/2014 10:51 AM, Subramanya Sastry wrote:

So, going back to your original implementation, here are at least 3 ways I see
this working:

2. action=expandtemplates returns a html.../html for the expansion of
{{T}}, but also provides an additional API response header that tells Parsoid
that T was a special content model page and that the raw HTML that it received
should not be sanitized.

Actually, the html/html wrapper is not even required here since the new API
response header (for example, X-Content-Model: HTML) is sufficient to know what
to do with the response body.

  But that would only work if {{T}} was the whole text that was being expanded 
(I
guess that's what you do with parsoid, right? Took me a minute to realize that).
expandtemplates operates on full wikitext. If the input is something like

   == Foo ==
   {{T}}

   [[Category:Bla}}

Then expanding {{T}} without a wrapper and pretending the result was HTML would
just be wrong.


Parsoid handles this correctly. We have mechanisms for injecting HTML as 
well as wikitext into the toplevel page. For example, tag extensions 
currently return fully expanded html (we use action=parse API endpoint) 
and we inject that HTML into the page. So, consider this wikitext for 
page P.


== Foo ==
{{wikitext-transclusion}}
  *a1
map .. ... /map
  *a2
{{T}} (the html-content-model-transclusion)
  *a3

Parsoid gets wikitext from the API for {{wikitext-transclusion}}, parses 
it and injects the tokens into the P's content. Parsoid gets HTML from 
the API for map./map and injects the HTML into the 
not-fully-processed wikitext of P (by adding an appropriate token 
wrapper). So, if {{T}} returns HTML (i.e. the MW API lets Parsoid know 
that it is HTML), Parsoid can inject the HTML into the 
not-fully-processed wikitext and ensure that the final output comes out 
right (in this case, the HTML from both the map extension and {{T}} 
would not get sanitized as it should be).


Does that help explain why we said we don't need the html wrapper?

All that said, if you want to provide the wrapper with html 
model=whatever fully-expanded-HTML/html, we can handle that as 
well. We'll use the model attribute of the wrapper, discard the wrapper 
and use the contents in our pipeline.


So, model information either as an attribute on the wrapper, api 
response header, or a property in the JSON/XML response structure would 
all work for us. I don't have clarity on which of these three is the 
best mechanism for providing the template-page content-model information 
to clients .. so till such time I understand that better, I dont have an 
opinion about the specific mechanism. However, in his previous message, 
Gabriel indicated that a property in the JSON/XML response structure 
might work better for multi-part responses.


Subbu.


Regarding trusting the output: MediaWiki core trusts the generated HTML for
direct output. It's no different from the HTML generated by e.g. special pages
in that regard.

I think something like html transclusion={{T}} model=whatever.../html
would work best.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-16 Thread Daniel Kinzler
Hi again!

I have rewritten the patch that enabled HTML based transclusion:

https://gerrit.wikimedia.org/r/#/c/132710/

I tried to address the concerns raised about my previous attempt, namely, how
HTML based transclusion is handled in expandtemplates, and how page meta data
such as resource modules get passed from the transcluded content to the main
parser output (this should work now).

For expandtemplates, I decided to just keep HTML based transclusions as they are
- including special page transclusions. So, expandtemplates will simply leave
{{Special:Foo}} and {{MediaWiki:Foo.js}} in the expanded text, while in the xml
output, you can still see them as template calls.

Cheers,
Daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-16 Thread Gabriel Wicke
On 05/15/2014 04:42 PM, Daniel Kinzler wrote:
 The one thing that will not work on wikis with
 $wgRawHtml disabled is parsing the output of expandtemplates.

Yes, which means that it won't work with Parsoid, Flow, VE and other users.

I do think that we can do better, and I pointed out possible ways to do so
in my earlier mail:

 My preference
 would be to let the consumer directly ask for pre-expanded wikitext *or*
 HTML, without overloading action=expandtemplates. Even indicating the
 content type explicitly in the API response (rather than inline with an HTML
 tag) would be a better stop-gap as it would avoid some of the security and
 compatibility issues described above.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-15 Thread Daniel Kinzler
Am 14.05.2014 16:04, schrieb Gabriel Wicke:
 On 05/14/2014 03:22 PM, Daniel Kinzler wrote:
 My patch doesn't change the handling of html.../html by the parser. As
 before, the parser will pass HTML code in html.../html through only if
 wgRawHtml is enabled, and will mangle/sanitize it otherwise.
 
 
 Oh, I thought that you wanted to support normal wikis with $wgRawHtml 
 disabled.

I want to, and I do. html is not sued for normal rendering, it is used by
expandtemplates only. During normal rendering, a strip mark is inserted, which
will work on all wikis. The one thing that will not work on wikis with
$wgRawHtml disabled is parsing the output of expandtemplates.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Gabriel Wicke
On 05/13/2014 05:37 PM, Daniel Kinzler wrote:
 Hi all!
 
 During the hackathon, I worked on a patch that would make it possible for
 non-textual content to be included on wikitext pages using the template 
 syntax.
 The idea is that if we have a content handler that e.g. generates awesome
 diagrams from JSON data, like the extension Dan Andreescu wrote, we want to be
 able to use that output on a wiki page. But until now, that would have 
 required
 the content handler to generate wikitext for the transclusion - not easily 
 done.


It sounds like this won't work well with current Parsoid. We are using
action=expandtemplates for the preprocessing of transclusions, and then
parse the contents using Parsoid. The content is finally
passed through the sanitizer to keep XSS at bay.

This means that HTML returned from the preprocessor needs to be valid in
wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
possible, but my impression is that you are shooting for something that's
closer to the behavior of a tag extension. Those already bypass the
sanitizer, so would be less troublesome in the short term. We currently also
can't process transclusions independently to HTML, as we still have to
support unbalanced templates. We are moving into that direction though,
which should also make it easier to support non-wikitext transclusion content.

In the longer team, Parsoid will request pre-sanitized and balanced HTML
from the content API [1,2] for everything but unbalanced wikitext content
[3]. The content API will treat it like any other request, and ask the
storage service for the HTML. If that's found, then it is directly returned
and no rendering happens. This is going to be the typical and fast case. If
there is however no HTML in storage for that revision the content API will
just call the renderer service and save the HTML back / return it to clients
like Parsoid.

So it is important to think of renderers as services, so that they are
usable from the content API and Parsoid. For existing PHP code this could
even be action=parse, but for new renderers without a need or desire to tie
themselves to MediaWiki internals I'd recommend to think of them as their
own service. This can also make them more attractive to third party
contributors from outside the MediaWiki world, as has for example recently
happened with Mathoid.

Gabriel

[1]: https://www.mediawiki.org/wiki/Requests_for_comment/Content_API
[2]: https://github.com/gwicke/restface
[3]: We are currently mentoring a GSoC project to collect statistics on
issues like unbalanced templates, which should allow us to systematically
mark those transclusions by wrapping them in a domparse tag in wikitext.
All transclusions outside of domparse will then be expected to yield
stand-alone HTML.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Daniel Kinzler
Thanks all for the imput!

Am 14.05.2014 10:17, schrieb Gabriel Wicke: On 05/13/2014 05:37 PM, Daniel
Kinzler wrote:
 It sounds like this won't work well with current Parsoid. We are using
 action=expandtemplates for the preprocessing of transclusions, and then
 parse the contents using Parsoid. The content is finally
 passed through the sanitizer to keep XSS at bay.

 This means that HTML returned from the preprocessor needs to be valid in
 wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
 possible, but my impression is that you are shooting for something that's
 closer to the behavior of a tag extension. Those already bypass the
 sanitizer, so would be less troublesome in the short term.

Yes. Just treat html.../html like a tag extension, and it should work fine.
Do you see any problems with that?

 So it is important to think of renderers as services, so that they are
 usable from the content API and Parsoid. For existing PHP code this could
 even be action=parse, but for new renderers without a need or desire to tie
 themselves to MediaWiki internals I'd recommend to think of them as their
 own service. This can also make them more attractive to third party
 contributors from outside the MediaWiki world, as has for example recently
 happened with Mathoid.

True, but that has little to do with my patch. It just means that 3rd party
Content objects should preferably implement getHtml() by calling out to a
service object.

Am 13.05.2014 21:38, schrieb Brad Jorsch (Anomie):
 To avoid the wikitext mangling, you could wrap it in some tag that works
 like html if $wgRawHtml is set and pre otherwise.

But pre will result in *escaped* HTML. That's just another kind of mangling.
It's at all the normal result of parsing.

Basically, the html mode is for expandtemplates only, and not intended to be
follow up by actual parsing.

Am 13.05.2014 21:38, schrieb Brad Jorsch (Anomie):
 Or one step further, maybe a tag foo wikitext={{P}}html goes here/foo
 that parses just as {{P}} does (and ignores html goes here entirely),
 which preserves the property that the output of expandtemplates will mostly
 work when passed back to the parser.

Hm... that's an interesting idea, I'll think about it!

Btw, just so this is mentioned somewhere: it would be very easy to simply not
expand such templates at all in expandtemplates mode, keeping them as {{T}} or
[[T]].

Am 14.05.2014 00:11, schrieb Matthew Flaschen:
 From working with Dan on this, the main issue is the ResourceLoader module 
 that the diagrams require (it uses a JavaScript library called Vega, plus a
 couple supporting libraries, and simple MW setup code).
 
 The container element that it needs can be as simple as:
 
 div data-something=.../div
 
 which is actually valid wikitext.

So, there is no server side rendering at all? It's all done using JS on the
client? Ok then, HTML transclusion isn't the solution.

 Can you outline how RL modules would be handled in the transclusion
 scenario?

The current patch does not really address that problem, I'm afraid. I can think
of two solutions:

* Create an SyntheticHtmlContent class that would hold meta info about modules
etc, just like ParserOutput - perhaps it would just contain a ParserOutput
object.  And an equvalent SyntheticWikitextContent class, perhaps. That would
allow us to pass such meta-info around as needed.

* Move the entire logic for HTML based transclusion into the wikitext parser,
where it can just call getParserOutput() on the respective Content object. We
would then no longer need the generic infrastructure for HTML transclusion.
Maybe that would be a better solution in the end.

Hm... yes, I should make an alternative patch using that approach, so we can
compare.


Thanks for your input!
-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Gabriel Wicke
On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
 This means that HTML returned from the preprocessor needs to be valid in
 wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
 possible, but my impression is that you are shooting for something that's
 closer to the behavior of a tag extension. Those already bypass the
 sanitizer, so would be less troublesome in the short term.
 
 Yes. Just treat html.../html like a tag extension, and it should work 
 fine.
 Do you see any problems with that?

First of all you'll have to make sure that users cannot inject html tags
as that would enable arbitrary XSS. I might have missed it, but I believe
that this is not yet done in your current patch.

In contrast to normal tag extensions html would also contain fully
rendered HTML, and should not be piped through action=parse as is done in
Parsoid for tag extensions (in absence of a direct tag extension expansion
API end point). We and other users of the expandtemplates API will have to
add special-case handling for this pseudo tag extension.

In HTML, the html tag is also not meant to be used inside the body of a
page. I'd suggest using a different tag name to avoid issues with HTML
parsers and potential name conflicts with existing tag extensions.

Overall it does not feel like a very clean way to do this. My preference
would be to let the consumer directly ask for pre-expanded wikitext *or*
HTML, without overloading action=expandtemplates. Even indicating the
content type explicitly in the API response (rather than inline with an HTML
tag) would be a better stop-gap as it would avoid some of the security and
compatibility issues described above.

 So it is important to think of renderers as services, so that they are
 usable from the content API and Parsoid. For existing PHP code this could
 even be action=parse, but for new renderers without a need or desire to tie
 themselves to MediaWiki internals I'd recommend to think of them as their
 own service. This can also make them more attractive to third party
 contributors from outside the MediaWiki world, as has for example recently
 happened with Mathoid.
 
 True, but that has little to do with my patch. It just means that 3rd party
 Content objects should preferably implement getHtml() by calling out to a
 service object.

You are right that it is not an immediate issue with your patch. The point
is about the *longer-term* role of the ContentHandler vs. the content API.
The ContentHandler could either try to be the central piece of our new
content API, or could become an integration point that normally calls out to
the content API and other services to retrieve HTML.

To me the latter is preferable as it enables us to optimize the content API
for high request rates by concentrating on doing one job well, and lets us
leverage this API from the server-side MediaWiki front-end through
ContentHandler.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Daniel Kinzler
Am 14.05.2014 15:11, schrieb Gabriel Wicke:
 On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
 This means that HTML returned from the preprocessor needs to be valid in
 wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
 possible, but my impression is that you are shooting for something that's
 closer to the behavior of a tag extension. Those already bypass the
 sanitizer, so would be less troublesome in the short term.

 Yes. Just treat html.../html like a tag extension, and it should work 
 fine.
 Do you see any problems with that?
 
 First of all you'll have to make sure that users cannot inject html tags
 as that would enable arbitrary XSS. I might have missed it, but I believe
 that this is not yet done in your current patch.

My patch doesn't change the handling of html.../html by the parser. As
before, the parser will pass HTML code in html.../html through only if
wgRawHtml is enabled, and will mangle/sanitize it otherwise.

My patch does mean however that the text return by expandtemplates may not
render as expected when processed by the parser. Perhaps anomie's approach of
preserving the original template call would work, something like:

  html template={{T}}.../html

Then, the parser could apply the normal expansion when encountering the tag,
ignoring the pre-rendered HTML.

 In contrast to normal tag extensions html would also contain fully
 rendered HTML, and should not be piped through action=parse as is done in
 Parsoid for tag extensions (in absence of a direct tag extension expansion
 API end point). We and other users of the expandtemplates API will have to
 add special-case handling for this pseudo tag extension.

Handling for the html tag should already be in place, since it's part of the
core spec. The issue is only to know when to allow/trust such html tags, and
when to treat them as plain text (or like a pre tag).

 In HTML, the html tag is also not meant to be used inside the body of a
 page. I'd suggest using a different tag name to avoid issues with HTML
 parsers and potential name conflicts with existing tag extensions.

As above: html is part of the core syntax, to support $wgRawHtml. It's just
disabled per default.

 Overall it does not feel like a very clean way to do this. My preference
 would be to let the consumer directly ask for pre-expanded wikitext *or*
 HTML, without overloading action=expandtemplates. 

The question is how to represent non-wikitext transclusions in the output of
expandtemplates. We'll need an answer to this question in any case.

For the main purpose of my patch, expandtemplates is irrelevant. I added the
special mode that generates html specifically to have a consistent wikitext
representation for use by expandtemplates. I could simply disable it just as
well, so no expansion would apply for such templates when calling
expandtemplates (as is done for special page inclusiono).

 Even indicating the
 content type explicitly in the API response (rather than inline with an HTML
 tag) would be a better stop-gap as it would avoid some of the security and
 compatibility issues described above.

The content type did not change. It's wikitext.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Gabriel Wicke
On 05/14/2014 03:22 PM, Daniel Kinzler wrote:
 My patch doesn't change the handling of html.../html by the parser. As
 before, the parser will pass HTML code in html.../html through only if
 wgRawHtml is enabled, and will mangle/sanitize it otherwise.


Oh, I thought that you wanted to support normal wikis with $wgRawHtml disabled.

 The content type did not change. It's wikitext.
Anything is wikitext ;)

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-14 Thread Dan Andreescu

  Can you outline how RL modules would be handled in the transclusion
  scenario?

 The current patch does not really address that problem, I'm afraid. I can
 think
 of two solutions:

 * Create an SyntheticHtmlContent class that would hold meta info about
 modules
 etc, just like ParserOutput - perhaps it would just contain a ParserOutput
 object.  And an equvalent SyntheticWikitextContent class, perhaps. That
 would
 allow us to pass such meta-info around as needed.

 * Move the entire logic for HTML based transclusion into the wikitext
 parser,
 where it can just call getParserOutput() on the respective Content object.
 We
 would then no longer need the generic infrastructure for HTML transclusion.
 Maybe that would be a better solution in the end.

 Hm... yes, I should make an alternative patch using that approach, so we
 can
 compare.


Thanks a lot Daniel, I'm happy to help test / try out any solutions you
want to experiment with.  I've moved my work to gerrit:
https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/Limnand
the last commit (with a lot of help from Matt F.) may be ready for you
to use as a use case.  Let me know if it'd be helpful to install this
somewhere in labs.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-13 Thread Brad Jorsch (Anomie)
On Tue, May 13, 2014 at 11:37 AM, Daniel Kinzler dan...@brightbyte.dewrote:

 As Brion pointed out in a comment to my original, there is another caveat:
 what
 should the expandtemplates module do when expanding non-wikitext
 templates? I
 decided to just wrap the HTML in html.../html tags instead of using a
 strip
 mark in this case. The resulting wikitext is however only correct if
 $wgRawHtml is enabled, otherwise, the HTML will get mangled/escaped by
 wikitext
 parsing. This seems acceptable to me, but please let me know if you have a
 better idea.


Just brainstorming:

To avoid the wikitext mangling, you could wrap it in some tag that works
like html if $wgRawHtml is set and pre otherwise.

Or one step further, maybe a tag foo wikitext={{P}}html goes here/foo
that parses just as {{P}} does (and ignores html goes here entirely),
which preserves the property that the output of expandtemplates will mostly
work when passed back to the parser.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-13 Thread Matthew Flaschen

On 05/13/2014 11:37 AM, Daniel Kinzler wrote:

Hi all!

During the hackathon, I worked on a patch that would make it possible for
non-textual content to be included on wikitext pages using the template syntax.
The idea is that if we have a content handler that e.g. generates awesome
diagrams from JSON data, like the extension Dan Andreescu wrote, we want to be
able to use that output on a wiki page. But until now, that would have required
the content handler to generate wikitext for the transclusion - not easily done.


From working with Dan on this, the main issue is the ResourceLoader 
module that the diagrams require (it uses a JavaScript library called 
Vega, plus a couple supporting libraries, and simple MW setup code).


The container element that it needs can be as simple as:

div data-something=.../div

which is actually valid wikitext.

Can you outline how RL modules would be handled in the transclusion 
scenario?


Matt Flaschen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l