Lydia_Pintscher added a comment.
I assume this can be closed now?TASK DETAILhttps://phabricator.wikimedia.org/T107595EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: brion, Lydia_PintscherCc: Bianjiang, Nirmos, CCicalese_WMF, PokestarFan, Rical, Ayack, -jem-,
cscott added a comment.
If we use MCR for annotation storage, it would be useful to have a canonical URL for the contents of a specific slot. That might be an API URL, like https://en.wikipedia.org/api/rest_v1/page/html/Main_Page/749836961/ or else a user-visible URL like
Jdforrester-WMF added a comment.
Notes for the session right now: https://etherpad.wikimedia.org/p/devsummit17-multi-content-revisionsTASK DETAILhttps://phabricator.wikimedia.org/T107595EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: brion, Jdforrester-WMFCc:
Jdforrester-WMF added a comment.
Just for clarity, as I've worked on this task but not actually commented, we in Editing see MCR as very important to our long-term plans. The use cases laid out at Multi-Content Revisions#Use Cases cover a lot, but I'll just pull out the four that we see as most
daniel added a comment.
In T107595#2791142, @TomT0m wrote:
Ok, I got confused. Does that mean that the documentation will not have its wikipage address anymore ?
Yes, the documentation would be part of the template page proper, and would not have a separate title.
Would this then be possible
Tgr added a comment.
In T107595#2791067, @TomT0m wrote:
If I understand correctly, this feature will potentially allow to view an article with the versions of the templates that existed at the time the wikitext was edited.
You might be thinking of Memento (which is not related to this in any
TomT0m added a comment.
In T107595#279, @daniel wrote:
@TomT0m No, Multi-Content-Revisions does not help with consistent display of old template revisions. Well, it does in cases where the use of templates is replaced by the use of slots - if e.g. template documentation was stored in a slot
daniel added a comment.
@TomT0m No, Multi-Content-Revisions does not help with consistent display of old template revisions. Well, it does in cases where the use of templates is replaced by the use of slots - if e.g. template documentation was stored in a slot instead of a subpage, you would
TomT0m added a comment.
Question : History of old articles
If I understand correctly, this feature will potentially allow to view an article with the versions of the templates that existed at the time the wikitext was edited. Two questions arise then :
will that also work for deleted templates
daniel added a comment.
In T107595#2678497, @Alsee wrote:
In T107595#2667512, @daniel wrote:
What I take away from @Alsee's comment is that we should provide a more comprehensive and detailed overview of the use cases.
So the answer is no, no thought of investigating whether the editing
Lydia_Pintscher added a comment.
It's not true that we have not asked the community. Structured data for Commons has been asked for many many times. People are very happy with the progress we have made so far as can be seen for example here:
Alsee added a comment.
In T107595#2666114, @RobLa-WMF wrote:
In T107595#2666094, @Alsee wrote:
Did anyone consider that it might be a bad idea to start building a radical change to the editing environment without investigating whether the editing community wants this?
Each of the use cases
daniel added a comment.
In T107595#2675167, @Tgr wrote:
It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:
I'm missing the group "currently embedded in wikitext and would benefit from
Lydia_Pintscher added a comment.
In T107595#2675028, @RobLa-WMF wrote:
I may update the description of this task and of the RFC on mediawiki.org to say this. This answer isn't etched in stone, but when someone asks me "what is the MVP for Multi-Content Revisions", I'll say "structured media
Lydia_Pintscher added a comment.
In T107595#2675167, @Tgr wrote:
It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:
data that would otherwise be stored on separate pages but could be bundled
Lydia_Pintscher added a comment.
In T107595#2675028, @RobLa-WMF wrote:
Thanks for reminding us of this. You're obviously the primary contact from WMDE for this, but who is the product manager from WMDE whose work would be blocked if this is delayed? Is that @Lydia_Pintscher or someone else?
Tgr added a comment.
It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:
data that would otherwise be stored on separate pages but could be bundled into a single page for better UX: media
RobLa-WMF added a comment.
In T107595#2674782, @daniel wrote:
To me as a Wikidata developer, the "killer use case" is structured media info, but e.g. James, Mark, or Kaldari may have other priorities. The Wikidata team will provide a brief summary of the requirement and rationale for structured
daniel added a comment.
In T107595#2674606, @RobLa-WMF wrote:
Well, the "lot clearer" assertion remains to be seen. I think the current proposal still seems like an enormous change. I'm starting to wrap my head around it, but I can't fault many skeptics for questioning whether this represents a
RobLa-WMF added a comment.
In T107595#2671264, @daniel wrote:
In T107595#2668520, @RobLa-WMF wrote:
The risk: the more that our data formats become a complex mystery that is only understood by a handful of people, the fewer people that will trust the systems we produce.
Ah, yes, I agree. The
daniel added a comment.
In T107595#2668520, @RobLa-WMF wrote:
The risk: the more that our data formats become a complex mystery that is only understood by a handful of people, the fewer people that will trust the systems we produce.
Ah, yes, I agree. The structure of our content should be
Tgr added a comment.
In T107595#2667512, @daniel wrote:
I think little of that complexity should be exposed to users. We probably don't want editors to freely mix and match slots - rather, we want an integrated experience for editing and display. Ideally editors should neither know nor care
RobLa-WMF added a comment.
In T107595#2667512, @daniel wrote:
I think little of that complexity should be exposed to users. We probably don't want editors to freely mix and match slots - rather, we want an integrated experience for editing and display. Ideally editors should neither know nor
daniel added a comment.
@RobLa-WMF wrote
Now, it would seem as though you are bringing this point up now because you're worried about making the system more complicated. Yes, that seems like a reasonable fear. A multi-slot "revision" seems similar to a file system fork, and will inevitably
RobLa-WMF added a comment.
In T107595#2666094, @Alsee wrote:
Did anyone consider that it might be a bad idea to start building a radical change to the editing environment without investigating whether the editing community wants this?
Each of the use cases have had quite a bit of discussion,
Alsee added a comment.
My apologies, my intent wasn't to try to prove a case against MCR here. (Although I do understand why replies focused in that direction). Perhaps it would help if I shortened my previous comment:
Did anyone consider that it might be a bad idea to start building a radical
brion added a comment.
I wrote up some quick thoughts at https://www.mediawiki.org/wiki/User:Brion_VIBBER/MCR_alternative_thoughts
Mainly exploring along two lines:
what if we did a model with separate data tables for each new 'slot' instead of a common content-blob interface (possibly more
Tgr added a comment.
In T107595#2664438, @Pppery wrote:
Structured Media Data: What exactly is this seperating license information from? This proposed change seems like it would lose some of the flexibility in file licenses
Flexibility means it is impossible to build assumptions into
Pppery added a comment.
Agree, Alsee. I don't find any of the use cases for this very compelling: Refutations of some of the usecases:
Structured Media Data: What exactly is this seperating license information from? This proposed change seems like it would lose some of the flexibility in file
TomT0m added a comment.
In T107595#2664347, @Alsee wrote:
A page is simply a text file.
A page is definitely not a simple text page. It's a text page written in a programming language - the wikitext and tempates - that happens to have a textual representation. It also includes references to
Alsee added a comment.
Did anyone consider that it might be a bad idea to start building a radical change to the editing environment without investigating whether the editing community wants this? Ripping categories and templates and other stuff entirely out of the page?
Wiki operates on an
daniel added a comment.
@Pppery I'm refering to this mess: https://commons.wikimedia.org/w/index.php?title=File:L%C3%ADneas_de_Nazca,_Nazca,_Per%C3%BA,_2015-07-29,_DD_46.JPG="">.
Here's an overview of the use cases for MCR: https://www.mediawiki.org/wiki/Multi-Content_Revisions#Use_Cases.TASK
Pppery added a comment.
Its really just a gut feeling that this is needless complexifying change. There is already a seperate TemplateData editor that can be accessed when you click the edit link for a template. And I'm not sure what series of nested templates for file license you are referring
Pppery added a comment.
My concern is that this is just part of a general trend of making things more complicated than they need to be.TASK DETAILhttps://phabricator.wikimedia.org/T107595EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: brion, PpperyCc:
RobLa-WMF added a comment.
@daniel , I've added a stub at https://www.mediawiki.org/wiki/Requests_for_comment/Multi-content_revisions
Could you port the bulk of the prose of this RFC over there?TASK DETAILhttps://phabricator.wikimedia.org/T107595EMAIL
daniel added a comment.
In T107595#2456127, @RobLa-WMF wrote:
I'd like to discuss the state of this RFC in our planning meeting tomorrow (E227)
Sorry, couldn't make it to the meeting. Let's talk about it next week.TASK DETAILhttps://phabricator.wikimedia.org/T107595EMAIL
daniel added a comment.
Quick summary of a chat with @GWicke:
RevisionContentLookup, RESTBase, and Parsoid:
- There should be an implementation of RevisionContentLookup based on RESTBase
- RESTBase could provide Parsoid HTML "renders" as a "virtual slot".
- Each revision may have
GWicke added a comment.
> As I understand it, restbase is a front-end caching proxy store, exposed to
the public internet.
For most use cases (including HTML), it is actually *storing*, and not just
caching. It is the equivalent of ExternalStore and most of the text table,
including
brion added a comment.
In https://phabricator.wikimedia.org/T107595#2266131, @GWicke wrote:
> The use case for providing metadata is so that we can use stores like
RESTBase, which already provide an API keyed on title, revision & render ID. It
also already deals with the complexities
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#2266131, @GWicke wrote:
> Basically, if we don't have a way to provide this key information to the
backend store, then we can't access all the multi-content revision data that's
already out there through this
GWicke added a comment.
The use case for providing metadata is so that we can use stores like
RESTBase, which already provide an API keyed on title, revision & render ID. It
also already deals with the complexities you mention.
Basically, if we don't have a way to provide this key
brion added a comment.
If I understand, the case for passing more metadata to the blob store is as a
hint for cross-blob data compression.
For this I think we mainly want to pass the identifier of a related blob: the
blob with the data from the same slot in the previous revision. If the
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#2265186, @GWicke wrote:
> >> In addition to title and revision (which I assume remains an integer),
we'd need an optional v1 UUID parameter to retrieve specific renders, in both
the request & response interfaces.
>
daniel added a comment.
@GWicke Perhaps some confusion is caused by us thinking of the storage
backend in different terms. For me, RESTBase is a BlobStore. A BlobStore deals
with binary data, which it stores and assignes urls to, and which it can
retrieve given such a url. That's it.
GWicke added a comment.
> In any case, the PageUpdater / WikiPage code needs to trigger notifications
(produce events). I don't care what mechanism it used for that. Or rather: I'm
very happy if we get a generalized mechanism. We'll have to agree on some kind
of schema for revisions, slots,
daniel added a comment.
There is no redirection to maintain. The blob url from the old revision
In https://phabricator.wikimedia.org/T107595#2265137, @GWicke wrote:
> Makes sense, some of these fields won't change between revisions. Depending
on the constraints, it might still make
GWicke added a comment.
> Blobs would typically be shared by different revisions of the same page.
This happens every time one primary slot is edited, but another is not changed.
E.g. the free wikitext description of a file is edited, but the structured data
isn't (or vice versa). Or the
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#2264799, @GWicke wrote:
> It is not entirely clear to me whether PageUpdater (and RevisionUpdater)
are meant to only handle synchronous low-level updates, or whether they are
meant to orchestrate asynchronous change
daniel added a comment.
///me notes that we are getting side tracked here, and this could turn into a
separate rfc//
I'd rather have the Transaction object know about the database, than the
other way around. Why should the database be in charge of transactions (other
than transactions
brion added a comment.
> This assumes the BlobStore will actually talk to the (same) database. I
would like to have Transaction separate from the DB stuff, so it can be used
just as well with files, or Cassandra, or Swift, or whatever we come up with to
store blobs. We shouldn't assume that
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#2264511, @brion wrote:
> and internally in the BlobStore's save method, we add the rollback callback
straight onto the db object:
>
> That avoids having transaction state live separately in both the connection
and
brion added a comment.
In https://phabricator.wikimedia.org/T107595#2264334, @daniel wrote:
> We could (optionally?) provide a transaction context to the blob store like
this:
I kinda like that, yeah. Maybe extend Database with a transactional interface
that takes a callback:
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#2264302, @brion wrote:
> The remaining questions are
>
> - whether we want to pass the $dbw parameter through (do we always go
through load balancer in which case it'll be the same connection anyway? or are
there
daniel added a comment.
We could (optionally?) provide a transaction context to the blob store like
this:
$trx = new Transaction();
$trx->addDBConnection( $dbw );
$trx->start();
try {
foreach ( $something as $thing ) {
$url = $blobStore->saveBlob( $data, $trx
brion added a comment.
(if RevisionBuilder takes a $dbw param via constructor/factory, then the
question of the connection is easier)
TASK DETAIL
https://phabricator.wikimedia.org/T107595
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: daniel,
brion added a comment.
> The above code would replace much of what is in the Revision class now, in
particular insertOn(). We can keep Revision around, but I'm not sure we can
provide b/c for insertOn().
b/c here looks relatively straightforward to me; it creates a new revision
with an
brion added a comment.
re this:
$bs->deleteBlob( $dataUrl ); // dk: this goes wrong if the URL is
content/hash based!
I think the return from this:
$dataUrl = $bs->saveBlob( $content->serialize() );
needs to signal whether a blob was created or whether an existing blob
daniel added a comment.
Pseudo-code for `saveRevisionRecord()`
// assume we are in a db transaction
$this->checkIsCurrentRevision( $this->baseRevision ); // protect against
race condition
$model = $slots['main']->getModel(); // "main" model must always be there
$length
daniel added a comment.
Thanks for moving this forward, Brion!
Your code is pretty close to what I had in mind. I have repeated it below
with some changes marked `// dk`
In https://phabricator.wikimedia.org/T107595#2263968, @brion wrote:
> In MediaWiki in general we're pretty
brion added a comment.
Regarding transactional nature:
Assuming the backing blob storage continues to work on the model of the
current `text` table blobs with external storage backing, the "easy way" is to
allow extra backend blobs to leak in case of transaction rollback, and let them
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#2250053, @GWicke wrote:
> Some notes:
>
> - PageUpdater aims to provide similar functionality as the change
propagation service (using EventBus) & the job queue. Could you clarify why we
need another mechanism for
GWicke added a comment.
Some notes:
- PageUpdater aims to provide similar functionality as the change propagation
service (using EventBus) & the job queue. Could you clarify why we need another
mechanism for change propagation?
- The blob store does not provide any locality
daniel added a comment.
Addendum to my brain dump in
https://phabricator.wikimedia.org/T107595#2235538:
One question I got stuck on was: How do we provide a transactional context to
the blob stores? We can have a RevisionBuilder with beging/commit, but when
that interacts with the
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#2235621, @brion wrote:
> Thoughts:
>
> would RevisionContentLookup need both title and revision id in the same
lookup, or should we rely on database integrity for ids, and have a separate
lookup method as a
brion added a comment.
Ok in that case... I will trust nothing ;)
TASK DETAIL
https://phabricator.wikimedia.org/T107595
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: daniel, brion
Cc: RobLa-WMF, Yurik, ArielGlenn, APerson, TomT0m, Krenair,
daniel added a comment.
@brion beware that the patches are old, stale, incomplete, and include dead
ends. And possible some other dead things, in dark corners...
TASK DETAIL
https://phabricator.wikimedia.org/T107595
EMAIL PREFERENCES
brion added a comment.
Aaa and now I see the bits in gerrit. I'll review all this tomorrow when I'm
a little bit rested. Hehehe
TASK DETAIL
https://phabricator.wikimedia.org/T107595
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: daniel, brion
brion added a comment.
Ah great, that was mostly written before your post. ;) sounding good so
far... Do you have code fleshed out enough to share or should we take that
class structure and write fresh?
TASK DETAIL
https://phabricator.wikimedia.org/T107595
EMAIL PREFERENCES
daniel added a comment.
@brion: y! I have been thinking about this a lot lately. I have done
some code experiments I would like to share and document. I'm pretty busy, but
I'll do my best to squeeze this in. Keep poking me :)
Veeery quick overview (mostly for my own good):
daniel added a comment.
@Yurik that was actually what I had in mind originally. We called it
multi-part content, like the MIME encoding for emails. The problem is that it
is not backwards compatible. It would break everything that expects to be able
to edit text via the action=editpage
RobLa-WMF added a comment.
In https://phabricator.wikimedia.org/T107595#2180530, @Yurik wrote:
> Can we solve some of the proposed usecases by simply wraping "content" into
a higher level structure, e.g. json, to store multiple streams? For example,
for a hypothetical "tabular data",
Yurik added a comment.
Can we solve some of the proposed usecases by simply wraping "content" into a
higher level structure, e.g. json, to store multiple streams? For example, for
a hypothetical "tabular data", we could have
{
"license": "...",
"headers": [...],
Aklapper added a comment.
Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open.
**If the session in this task took place**, please make sure 1) that the
session Etherpad notes are linked from this task, 2) that followup tasks for
any actions identified have been created
Spage added a comment.
You mention
- categories etc. maintained as structured, user editable data outside the
wikitext
(please spell out "etc." :-) ).
So a page's categories would be in an additional primary slot. But categories
is currently markup in the wikitext. If you ask for the
daniel added a comment.
@spage if we have multiple content slot, we //can// store categories
separately. We can store them in a primary slot and edit them directly, or in a
derived slots (extracted from wikitext). Or we can leave things as they are. Or
we could allow people to enter categories
Qgil added a subscriber: Qgil.
Qgil added a comment.
Congratulations! This is one of the 52 proposals that made it through the first
deadline of the
https://phabricator.wikimedia.org/tag/wikimedia-developer-summit-2016/
selection process
daniel added a comment.
In https://phabricator.wikimedia.org/T107595#1676115, @GWicke wrote:
> @daniel, your revised version seems to focus even more on implementing
> storage systems, change propagation etc, rather than defining a data access
> interface for MediaWiki, which can be backed by
Tobi_WMDE_SW added a subscriber: Tobi_WMDE_SW.
Tobi_WMDE_SW added a comment.
@daniel will do in the
https://phabricator.wikimedia.org/tag/wikidata-sprint-2015-09-29/:
formulate concrete questions to be discussion in the RfC meeting and do some
experimental coding.
TASK DETAIL
GWicke added a comment.
@daniel, your revised version seems to focus even more on implementing storage
systems, change propagation etc, rather than defining a data access interface
for MediaWiki, which can be backed by services.
Could you clarify how you see this relate to ongoing efforts with
79 matches
Mail list logo