Thanks for the responses. I do want to convert HTML that cannot be assumed
to be clean, so it sounds like Parsoid will not solve the problem for now.

--James

On Fri, Nov 6, 2015 at 11:06 AM, Gabriel Wicke <gwi...@wikimedia.org> wrote:

> To add to what Eric & Subbu have said, here is a link to the API
> documentation for this end point:
>
>
> https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/post_transform_html_to_wikitext_title_revision
>
> On Fri, Nov 6, 2015 at 8:47 AM, Subramanya Sastry <ssas...@wikimedia.org>
> wrote:
>
> > On 11/06/2015 10:18 AM, James Montalvo wrote:
> >
> >> Can Parsoid be used to convert arbitrary HTML to wikitext? It's not
> clear
> >> to me whether it will only work with Parsoid's HTML+RDFa. I'm wondering
> if
> >> I could take snippets of HTML from non-MediaWiki webpages and convert
> them
> >> into wikitext.
> >>
> >
> > The right answer is: "It depends" :-)
> >
> > As Eric responded in his reply, Parsoid does convert some kinds of
> > arbitrary HTML to clean wikitext. See some additional examples at the end
> > of this email.
> >
> > However, if you really threw arbitrary HTML at it (ex: <em>..</em> or
> > <strong>..</strong>) Parsoid wouldn't know that it could potentially use
> ''
> > or ''' for those tags. Or, if you gave it input with all kinds of css and
> > other inlined attributes, you won't necessarily get the best wikitext
> from
> > it.
> >
> > But, if you tried to convert HTML that you got from say Google docs, Open
> > Office, Word, or other HTML-generation tools, the wikitext you get may
> not
> > be very pretty.
> >
> > We do want to keep improving Parsoid's abilities to get there, but it has
> > not been a high priority for us, but it would be a great GSoC or
> volunteer
> > project if someone wants to play with this and improve this feature given
> > that we are always playing catch up with all the other things we need to
> > get done.
> >
> > But, if you didn't have really arbitrary HTML, you can get some
> reasonable
> > looking wikitext out of it even without the markers. But, things like
> > images, templates, extensions .. obviously require the additional
> > attributes for Parsoid to generate canonical wikitext for that.
> >
> > Hope this helps.
> >
> > Subbu.
> >
> >
> >
> -------------------------------------------------------------------------------------------
> >
> > Some html -> wt examples:
> >
> > [subbu@earth bin] echo "<h2>foo</h2><p>a</p><p>b</p>" | node parse
> > --html2wt
> > == foo ==
> > a
> >
> > b
> > [subbu@earth bin] echo "<a href='http://en.wikipedia.org/wiki/Hampi
> '>Hampi</a>"
> > | node parse --html2wt
> > [[Hampi]]
> >
> > [subbu@earth bin] echo "<a href='http://it.wikipedia.org/wiki/Luna
> '>Luna</a>"
> > | node parse --html2wt
> > [[:it:Luna|Luna]]
> >
> > [subbu@earth bin] echo "<a href='http://it.wikipedia.org/wiki/Luna
> '>Luna</a>"
> > | node parse --html2wt --prefix itwiki
> > [[Luna]]
> >
> > [subbu@earth bin] echo "<ul><li>a</li><li>b</li><li>c</li></ul>" | node
> > parse --html2wt
> > * a
> > * b
> > * c
> >
> > [subbu@earth bin] echo <em>foo</em>" | node parse --html2wt
> > <em>foo</em>
> >
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to