[Wikitech-l] Cutting MediaWiki loose from wikitext

2012-04-30 Thread Daniel Kinzler
Hi all

Moving forward, I have just committed a first patch for review:

https://gerrit.wikimedia.org/r/#change,6101

Please have a look if you are interested.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-27 Thread Daniel Kinzler
On 27.03.2012 00:09, Platonides wrote:
 It looks really evil publishing that svn branch just days after git
 migration :)
 I think that branch -created months ago- should be migrated to git, so
 we could all despair..^W benefit from git wonderful branching abilities.

Indeed - when I asked Chad about that, he said ask me again once the dust has
settled. I'd be happy to have this in git.

Or... well, maybe I'll just make a patch from that branch, make a fresh branch
in git, and cherry pick the changes, trying to keep things minimal. Yea, that's
probably the best thing to do.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-27 Thread Daniel Kinzler
On 27.03.2012 00:33, Tim Starling wrote:
 For the record: we've discussed this previously and I'm fine with it.
 It's a well thought-out proposal, and the only request I had was to
 ensure that the DB schema supports some similar projects that we have
 in the idea pile, like multiple parser versions.

Thanks Tim! The one important bit I'd like to hear from you is... do you think
it is feasible to get this not only implemented but also reviewed and deployed
by August?... We are on a tight schedule with Wikidata, and this functionality
is a major blocker.

I think implementing ContentHandlers for MediaWiki would have a lot of benefits
for the future, but if it's not feasible to get it in quickly, I have to think
of an alternative way to implement structured data storage.

Thanks
Daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-27 Thread Daniel Kinzler
On 27.03.2012 00:37, MZMcBride wrote:
 It's an ancient assumption that's built in to many parts of MediaWiki (and
 many outside tools and scripts). Is there any kind of assessment about the
 level of impact this would have?

Not formally, just my own poking at the code base. There is a lot of places in
the code that access revision text, and do something with it, not all can easily
be found or changed (especially true for extensions).

My proposal covers a compatibility layer that will cause legacy code to just see
an empty page when trying to access the contents of a non-wikitext page. Only
code aware of content models will see any non-wikitext content. This should
avoid most problems, and should ensure that things will work as before at least
for everything that is wikitext.

 For example, would the diff engine need to be rewritten so that people can
 monitor these pages for vandalism? 

A diff engine needs to be implemented for each content model. The existing
engine(s) does not need to be rewritten, it will be used for all wikitext pages.

 Will these pages be editable in the same
 way as current wikitext pages? 

No. The entire point of this proposal is to be able to neatly supply specialized
display, editing and diffing of different kinds of content.

 If not, will there be special editors for the
 various data types? 

Indeed.

 What other parts of the MediaWiki codebase will be
 affected and to what extent? 

A few classes (like Revision or WikiPage) need some major additions or changes,
see the proposal on meta. Lots of places should eventually be changed to become
aware of content models, but don't need to be adapted immediately (see above).

 Will text still go in the text table or will
 separate tables and infrastructure be used?

Uh, did you read the proposal?...

All content is serialized just before storing it. It is stored into the text
table using the same code as before. The content model and serialization format
is recorded in the revision table.

Secondary data (index data, analogous to the link tables) may be extracted from
the content and stored in separate database tables, or in some other service, as
needed.

 I'm reminded a little of LiquidThreads for some reason. This idea sounds
 good, but I'm worried about the implementation details, particularly as the
 assumption you seek to upend is so old and ingrained.

It's more like the transition to using MediaHandlers instead of assuming
uploaded files to be images: existing concepts and actions are generalized to
apply to more types of content.

LiquidThreads introduces new concepts (threads, conversations) and interactions
(re-arranging, summarazing, etc) and tries to integrate them with the concepts
used for wiki pages. This seems far more complicated to me.

 The background is that the Wikidata project needs a way to store structured
 data (JSON) on wiki pages instead of wikitext. Having a pluggable system 
 would
 solve that problem along with several others, like doing away with the 
 special
 cases for JS/CSS, the ability to maintain categories etc separate from body
 text, manage Gadgets sanely on a wiki page, or several other things (see the
 link below).
 
 How would this affect categories being stored in wikitext (alongside the
 rest of the page content text)? That part doesn't make any sense to me.

Imagine a data model that works like mime/multipart email: you have a wrapper
that contains the main text as well as attachments. The whole shebang gets
serialized and stored in the text table, as usual. For displaying, editing and
visualizing, you have code that is aware of the multipart nature of the content,
and puts the parts together nicely.

However, the category stuff is a use case I'm just mentioning because it has bee
requested so often in the past (namely, editing categories, interlanguage links,
etc separately from the wiki text); this mechanism is not essential to the
concept of ContentHandlers, and not something I plan to implement for the
Wikidata project. It'S just somethign that will become much easier once we have
ContentHandlers.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-27 Thread Alex Brollo
I can't understand details of this talk, but if you like take a look to the
raw code of any ns0 page into it.wikisource and consider that area dati
is removed from wikitext as soon as an user opens the page in edit mode,
and re-builded as the user saves it; or take a look here:
http://it.wikisource.org/wiki/MediaWiki:Variabili.js where date used into
automation/help of edit are collected as js objects.


Alex brollo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-27 Thread Daniel Kinzler
On 27.03.2012 09:47, Alex Brollo wrote:
 I can't understand details of this talk, but if you like take a look to the
 raw code of any ns0 page into it.wikisource and consider that area dati
 is removed from wikitext as soon as an user opens the page in edit mode,
 and re-builded as the user saves it; or take a look here:
 http://it.wikisource.org/wiki/MediaWiki:Variabili.js where date used into
 automation/help of edit are collected as js objects.

Yes. Basically, the ContentHandler proposal would introduce native support for
this kind of thing into MediaWiki, instead of implementing it as a hack with
JavaScript. Wouldn't it be nice to get input forms for this data, or have nice
diffs of the structure, or good search results for data records?... Not to
mention the ability to actually query for individual data fields :)

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-27 Thread Antoine Musso
Daniel Kinzler wrote:
 A very rough prototype is in a dev branch here:
 
   http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

I guess we could have that migrated to Gerrit and review the project there.

-- 
Antoine hashar Musso


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-27 Thread Daniel Kinzler
On 27.03.2012 11:26, Antoine Musso wrote:
 Daniel Kinzler wrote:
 A very rough prototype is in a dev branch here:

   http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/
 
 I guess we could have that migrated to Gerrit and review the project there.

Sure, fine with me :) Though I will likely make a new branch and merge my
changes again more cleanly. What's there now is really a proof of concept. But
sure, have a look!

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread Daniel Kinzler
Hi all. I have a bold proposal (read: evil plan).

To put it briefly: I want to remove the assumption that MediaWiki pages contain
always wikitext. Instead, I propose a pluggable handler system for different
types of content, similar to what we have for file uploads. So, I propose to
associate a content model identifier with each page, and have handlers for
each model that provide serialization, rendering, an editor, etc.

The background is that the Wikidata project needs a way to store structured data
(JSON) on wiki pages instead of wikitext. Having a pluggable system would solve
that problem along with several others, like doing away with the special cases
for JS/CSS, the ability to maintain categories etc separate from body text,
manage Gadgets sanely on a wiki page, or several other things (see the link 
below).

I have described my plans in more detail on meta:

  http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler

A very rough prototype is in a dev branch here:

  http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

Please let me know what you think (here on the list, preferably, not on the talk
page there, at least for now).

Note that we *definitely* need this ability for Wikidata. We could do it
differently, but I think this would be the cleanest solution, and would have a
lot of mid- and long term benefits, even if it's a short term pain. I'm
presenting my plan here to find out if I'm on the right track, and whether it is
feasible to put this on the road map for 1.20. It would be my (and the Wikidata
team's) priority to implement this and see it through before Wikimania. I'm
convinced we have the manpower to get it done.

Cheers,
Daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread Alex Brollo
I agree that's hyronical to play with a powerful database-built project,
and to have no access nor encouragement to organize our data as should be
organized. But - we do use normal pages as data repository too, simply
marking some specific areas of pages as data areas. More, we use the same
page both as normal wikitext container and data container. Why not?

Alex brollo (it.source)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread John Erling Blad
I like this idea, it solves a lot of problems.
John

On Mon, Mar 26, 2012 at 4:45 PM, Daniel Kinzler dan...@brightbyte.de wrote:
 Hi all. I have a bold proposal (read: evil plan).

 To put it briefly: I want to remove the assumption that MediaWiki pages 
 contain
 always wikitext. Instead, I propose a pluggable handler system for different
 types of content, similar to what we have for file uploads. So, I propose to
 associate a content model identifier with each page, and have handlers for
 each model that provide serialization, rendering, an editor, etc.

 The background is that the Wikidata project needs a way to store structured 
 data
 (JSON) on wiki pages instead of wikitext. Having a pluggable system would 
 solve
 that problem along with several others, like doing away with the special cases
 for JS/CSS, the ability to maintain categories etc separate from body text,
 manage Gadgets sanely on a wiki page, or several other things (see the link 
 below).

 I have described my plans in more detail on meta:

  http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler

 A very rough prototype is in a dev branch here:

  http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

 Please let me know what you think (here on the list, preferably, not on the 
 talk
 page there, at least for now).

 Note that we *definitely* need this ability for Wikidata. We could do it
 differently, but I think this would be the cleanest solution, and would have a
 lot of mid- and long term benefits, even if it's a short term pain. I'm
 presenting my plan here to find out if I'm on the right track, and whether it 
 is
 feasible to put this on the road map for 1.20. It would be my (and the 
 Wikidata
 team's) priority to implement this and see it through before Wikimania. I'm
 convinced we have the manpower to get it done.

 Cheers,
 Daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread Brion Vibber
I'm generally in favor of this plan. I haven't looked over the specific
code experiments yet but the plan sounds solid. A few notes:

* over time we'll want to do things like migrate File: pages from 'plain
wikitext that happens to have an associated file' to 'structured data about
a file'. This will be magnificent.

* I wouldn't overmuch emphasize things like oh you could have pages in
markdown or tex!, though it does sound neat and all. :)

* we need to make sure that import/export round-trips things consistently,
including for non-wikitext stuff. Either that means making import/export
content-aware, or shipping the serialized form through the export XML?


As for timing; Daniel's hoping for something in the neighborhood of an
August deployment. I think if we keep things minimal that should be
feasible; it's somewhat similar to the migration of Image stuff with
MediaHandler classes.

I'm a bit uncertain about the idea of 'multipart' pages, though attached
data YES YES in some clean way is needed.

-- brion


On Mon, Mar 26, 2012 at 7:45 AM, Daniel Kinzler dan...@brightbyte.dewrote:

 Hi all. I have a bold proposal (read: evil plan).

 To put it briefly: I want to remove the assumption that MediaWiki pages
 contain
 always wikitext. Instead, I propose a pluggable handler system for
 different
 types of content, similar to what we have for file uploads. So, I propose
 to
 associate a content model identifier with each page, and have handlers
 for
 each model that provide serialization, rendering, an editor, etc.

 The background is that the Wikidata project needs a way to store
 structured data
 (JSON) on wiki pages instead of wikitext. Having a pluggable system would
 solve
 that problem along with several others, like doing away with the special
 cases
 for JS/CSS, the ability to maintain categories etc separate from body text,
 manage Gadgets sanely on a wiki page, or several other things (see the
 link below).

 I have described my plans in more detail on meta:

  http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler

 A very rough prototype is in a dev branch here:

  http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

 Please let me know what you think (here on the list, preferably, not on
 the talk
 page there, at least for now).

 Note that we *definitely* need this ability for Wikidata. We could do it
 differently, but I think this would be the cleanest solution, and would
 have a
 lot of mid- and long term benefits, even if it's a short term pain. I'm
 presenting my plan here to find out if I'm on the right track, and whether
 it is
 feasible to put this on the road map for 1.20. It would be my (and the
 Wikidata
 team's) priority to implement this and see it through before Wikimania. I'm
 convinced we have the manpower to get it done.

 Cheers,
 Daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread Daniel Kinzler
On 26.03.2012 22:02, Brion Vibber wrote:
 I'm generally in favor of this plan. I haven't looked over the specific
 code experiments yet but the plan sounds solid. 

YAY!

 * over time we'll want to do things like migrate File: pages from 'plain
 wikitext that happens to have an associated file' to 'structured data about
 a file'. This will be magnificent.

I hope to get the WMNL guys excited about this idea, this would really rock for
GLAM applications.

 * I wouldn't overmuch emphasize things like oh you could have pages in
 markdown or tex!, though it does sound neat and all. :)

Yes. For the records, i do *not* want to move Wikipedia format to another
syntax. (Well, I wish it *used* another syntax, but that's a completely separate
discussion).

 * we need to make sure that import/export round-trips things consistently,
 including for non-wikitext stuff. Either that means making import/export
 content-aware, or shipping the serialized form through the export XML?

I intend the importer/exporter to use the serialized form, and to be aware only
of the additional revision attributes specifying the content model and
serialization format.

How a wiki should react when importing content for an unknown handler is an open
issue, though. Fail? Import a blank page? Import as wikitext?...

But we don't need to solve that here and now.

 As for timing; Daniel's hoping for something in the neighborhood of an
 August deployment. I think if we keep things minimal that should be
 feasible; it's somewhat similar to the migration of Image stuff with
 MediaHandler classes.

This is because of Wikidata's tight timeline. We'll be working hard on getting
this ready soon.

 I'm a bit uncertain about the idea of 'multipart' pages, though attached
 data YES YES in some clean way is needed.

That bit is mostly idle musing - multipart and attachments are *not* needed
for Wikidata, though they open up several neat use cases.

Thanks for the feedback Brion!

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread Daniel Kinzler
On 26.03.2012 18:18, Alex Brollo wrote:
 I agree that's hyronical to play with a powerful database-built project,
 and to have no access nor encouragement to organize our data as should be
 organized. But - we do use normal pages as data repository too, simply
 marking some specific areas of pages as data areas. More, we use the same
 page both as normal wikitext container and data container. Why not?

Because it is not sufficient. There is no way to query such data efficiently,
and there is no standard web API to access this data, not URLs to reference it
(without the text around it).

The proposal allows for structured data as page content, as well as any other
type of page content, and it also potentially allows multiple types of data to
exist as part of the same page (using some mechanism of attachment or
multipart).

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread MZMcBride
Daniel Kinzler wrote:
 To put it briefly: I want to remove the assumption that MediaWiki pages
 contain always wikitext. Instead, I propose a pluggable handler system for
 different types of content, similar to what we have for file uploads. So, I
 propose to associate a content model identifier with each page, and have
 handlers for each model that provide serialization, rendering, an editor, etc.

It's an ancient assumption that's built in to many parts of MediaWiki (and
many outside tools and scripts). Is there any kind of assessment about the
level of impact this would have?

For example, would the diff engine need to be rewritten so that people can
monitor these pages for vandalism? Will these pages be editable in the same
way as current wikitext pages? If not, will there be special editors for the
various data types? What other parts of the MediaWiki codebase will be
affected and to what extent? Will text still go in the text table or will
separate tables and infrastructure be used?

I'm reminded a little of LiquidThreads for some reason. This idea sounds
good, but I'm worried about the implementation details, particularly as the
assumption you seek to upend is so old and ingrained.

 The background is that the Wikidata project needs a way to store structured
 data (JSON) on wiki pages instead of wikitext. Having a pluggable system would
 solve that problem along with several others, like doing away with the special
 cases for JS/CSS, the ability to maintain categories etc separate from body
 text, manage Gadgets sanely on a wiki page, or several other things (see the
 link below).

How would this affect categories being stored in wikitext (alongside the
rest of the page content text)? That part doesn't make any sense to me.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread Platonides
I like the general idea (haven't gone through the detailed pages).


 On 26.03.2012 22:02, Brion Vibber wrote:
 * over time we'll want to do things like migrate File: pages from 'plain
 wikitext that happens to have an associated file' to 'structured data about
 a file'. This will be magnificent.
I think that File: pages that happen to be svg is a much easier approach.


 I'm a bit uncertain about the idea of 'multipart' pages, though attached
 data YES YES in some clean way is needed.
 
 That bit is mostly idle musing - multipart and attachments are *not* 
 needed
 for Wikidata, though they open up several neat use cases.

It's just something to take into account when designing the extensibility.


 A very rough prototype is in a dev branch here:
 
   http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

It looks really evil publishing that svn branch just days after git
migration :)
I think that branch -created months ago- should be migrated to git, so
we could all despair..^W benefit from git wonderful branching abilities.

Best regards


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Cutting MediaWiki loose from wikitext

2012-03-26 Thread Tim Starling
On 27/03/12 01:45, Daniel Kinzler wrote:
 Hi all. I have a bold proposal (read: evil plan).
 
 To put it briefly: I want to remove the assumption that MediaWiki pages 
 contain
 always wikitext. Instead, I propose a pluggable handler system for different
 types of content, similar to what we have for file uploads. So, I propose to
 associate a content model identifier with each page, and have handlers for
 each model that provide serialization, rendering, an editor, etc.

For the record: we've discussed this previously and I'm fine with it.
It's a well thought-out proposal, and the only request I had was to
ensure that the DB schema supports some similar projects that we have
in the idea pile, like multiple parser versions.

On 27/03/12 09:37, MZMcBride wrote:
 For example, would the diff engine need to be rewritten so that people can
 monitor these pages for vandalism? Will these pages be editable in the same
 way as current wikitext pages? If not, will there be special editors for the
 various data types?

These questions are all answered on the notes page that Daniel linked
to. The answers are yes, no and yes.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l