Re: [Wikitech-l] From page history to sentence history

2011-01-21 Thread Aryeh Gregor
On Wed, Jan 19, 2011 at 4:15 PM, Anthony wikim...@inbox.org wrote:
 No, the question is why the relevant code is totally unrelated.

Well, you might ask why we don't just (selectively) dump the page,
revision, and text tables instead of doing XML dumps -- it seems like
it would be much simpler -- but I have no idea.  Perhaps it's to ease
processing with non-MediaWiki tools, but I'm not sure why that's a
design goal compared to the simplicity of SQL dumps.  Surely it
wouldn't be too hard to write a maintenance/ tool that just fetches
the revision text for a particular article at a particular point,
using only those three tables without any MediaWiki framework so it
can be used standalone.  Not to mention, the text table is immutable,
so creating and publishing text table dumps incrementally should be
trivial.

But I'm not going to criticize anyone from the peanut gallery here.  I
don't actually know much about the dumps work.  Happy-melon is correct
to point out that it might not be trivial to snip private info (even
oversighted revisions) from the text table, depending on how it's
constructed.  There might be other concerns too.

 And there are lots of lower-priority things that are being done.  And
 lots of dollars sitting on the sidelines doing nothing.

That's a discussion for foundation-l, not wikitech-l.

On Thu, Jan 20, 2011 at 4:04 AM, Anthony wikim...@inbox.org wrote:
 It wouldn't be trivial, but it wouldn't be particularly hard either.
 Most of the work is already being done.  It's just being done
 inefficiently.

I'm glad to see you know what you're talking about here.  Presumably
you've examined the relevant code closely and determined exactly how
you'd implement the necessary changes in order to evaluate the
difficulty.  Needless to say, patches are welcome.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] From page history to sentence history

2011-01-21 Thread Anthony
On Fri, Jan 21, 2011 at 6:48 AM, Aryeh Gregor
simetrical+wikil...@gmail.com wrote:
 Not to mention, the text table is immutable,
 so creating and publishing text table dumps incrementally should be
 trivial.

The problem there is deletion and oversight.  The best solution if you
didn't have to worry about that would be to have a database on the
dump servers with only public data, which accesses a live feed (over
the LAN).  Then creating a dump would be as simple as pg_dump, and
fancier incremental dumps could be made relatively simply as well.

Then again, if your live feed tells you which revisions to
delete/oversight, that's still a viable solution.

 On Thu, Jan 20, 2011 at 4:04 AM, Anthony wikim...@inbox.org wrote:
 It wouldn't be trivial, but it wouldn't be particularly hard either.
 Most of the work is already being done.  It's just being done
 inefficiently.

 I'm glad to see you know what you're talking about here.  Presumably
 you've examined the relevant code closely and determined exactly how
 you'd implement the necessary changes in order to evaluate the
 difficulty.  Needless to say, patches are welcome.

Access to the servers is welcome.  I can't possibly test and improve
performance without it.

Alternatively, give me a free live feed, and I'll make a decent dump
system here at home, and provide the source code when I'm done.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-19 Thread Aryeh Gregor
On Wed, Jan 19, 2011 at 3:59 AM, Anthony wikim...@inbox.org wrote:
 Why isn't this being used for the dumps?

Well, the relevant code is totally unrelated, so the question is sort
of a non sequitur.  If you mean Why don't we have incremental
dumps?, I guess Ariel is the person to ask.  I'm assuming the answer
is (as usual in software development) that there are higher-priority
things to do right now.  The concept of incremental dumps is pretty
obvious, but that doesn't mean it wouldn't take some manpower to get
them working.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-19 Thread Anthony
On Wed, Jan 19, 2011 at 3:33 AM, Aryeh Gregor
simetrical+wikil...@gmail.com wrote:
 On Wed, Jan 19, 2011 at 3:59 AM, Anthony wikim...@inbox.org wrote:
 Why isn't this being used for the dumps?

 Well, the relevant code is totally unrelated, so the question is sort
 of a non sequitur.

No, the question is why the relevant code is totally unrelated.
Specifically, I'm talking about the full history dumps.

 If you mean Why don't we have incremental dumps?

No, that's not the question.  The question is why are you
uncompressing and undiffing (from DiffHistoryBlobs) only to recompress
(to bz2) and then uncompress and recompress (to 7z) when you can get
roughly the same compression by just extracting the blobs and removing
any non-public data.  Or, if it's easier, continue to uncompress (in
gz) and undiff then rediff and recompress (in gz), as that will be
much much faster than compressing in bz2.

You'll also wind up with a full history dump which is *much* easier to
work with.  Yes, you'll break backward compatibility, but considering
that the English full history dump never finishes, even if you just
implemented it for that one it'd be better than the present, which is
to have nothing.

 I'm assuming the answer
 is (as usual in software development) that there are higher-priority
 things to do right now.

And there are lots of lower-priority things that are being done.  And
lots of dollars sitting on the sidelines doing nothing.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-19 Thread Platonides
masti wrote:
 On 01/18/2011 12:30 AM, Lars Aronsson wrote:
 On 01/17/2011 11:36 PM, masti wrote:
 what is the reason and what it can bring to the community?

 I tried to describe this. The task of finding out the
 history of a part of an article is very time consuming
 for long articles with a long history, where you have
 to manually look through lots of revisions that aren't
 related to the part of the article you are interested in.

 I took as the example the part of the flat geography
 of the city of Paris. Was this part controversial? Who
 edited it? Has it changed? When and by whom?

 Most edits to the article Paris are probably related to
 new elections, new buildings, new institutions. Most
 edits have nothing to do with the flat geography.
 So could the history view of maybe 5000 edits
 be quickly reduced down to 50 edits or even 5?


 In this rare situation it could be beneficial, but does it really make 
 sense in general? Workload and complication of interface, in my opinion, 
 is not worth it.
 
 masti

I think it makes sense, but more as an external tool which selected them
for you.
There are tools like http://wikipedia.ramselehof.de/wikiblame.php which
aim to do these things, but although I don't think they are so good,
they may be a good place to start.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-19 Thread Happy-melon

Anthony wikim...@inbox.org wrote in message 
news:AANLkTi=uk+uf3y_b+zld57wcfuef_7rf-bt8tnvtg...@mail.gmail.com...
 No, that's not the question.  The question is why are you
 uncompressing and undiffing (from DiffHistoryBlobs) only to recompress
 (to bz2) and then uncompress and recompress (to 7z) when you can get
 roughly the same compression by just extracting the blobs and removing
 any non-public data.

That's probably not nearly as straightforward as it sounds.  RevDel'd and 
suppressed revisions are not removed from the text storage; even Oversighted 
revisions are left there, only the entry in the revision table is removed or 
altered.  I don't know OTTOMH how regularly the DiffHistoryBlob system 
stores a 'key frame', and how easy it would be to break diff chains in order 
to snip out non-public data from them, but I'd guess a) not very, and b) 
that the current code doesn't give any consideration to doing so because 
there's no reason for it to do so.  So refactoring it to incorporate that, 
while not impossible, is a non-trivial amount of work.

 And there are lots of lower-priority things that are being done.  And
 lots of dollars sitting on the sidelines doing nothing.

Low-priority interesting things tend to get done when you have volunteers 
doing them.  While the value of some of the Foundation's expenditure is 
commonly debated, I think you'd struggle to argue that many of the WMF's 
dollars are not doing *anything*.

--HM 



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-19 Thread Anthony
On Wed, Jan 19, 2011 at 7:49 PM, Happy-melon happy-me...@live.com wrote:
 Anthony wikim...@inbox.org wrote in message
 news:AANLkTi=uk+uf3y_b+zld57wcfuef_7rf-bt8tnvtg...@mail.gmail.com...
 No, that's not the question.  The question is why are you
 uncompressing and undiffing (from DiffHistoryBlobs) only to recompress
 (to bz2) and then uncompress and recompress (to 7z) when you can get
 roughly the same compression by just extracting the blobs and removing
 any non-public data.

 That's probably not nearly as straightforward as it sounds.

I have no idea how straightforward it sounds, so I won't argue with that.

 RevDel'd and
 suppressed revisions are not removed from the text storage; even Oversighted
 revisions are left there, only the entry in the revision table is removed or
 altered.  I don't know OTTOMH how regularly the DiffHistoryBlob system
 stores a 'key frame', and how easy it would be to break diff chains in order
 to snip out non-public data from them, but I'd guess a) not very, and b)
 that the current code doesn't give any consideration to doing so because
 there's no reason for it to do so.  So refactoring it to incorporate that,
 while not impossible, is a non-trivial amount of work.

It wouldn't be trivial, but it wouldn't be particularly hard either.
Most of the work is already being done.  It's just being done
inefficiently.

On Wed, Jan 19, 2011 at 7:49 PM, Happy-melon happy-me...@live.com wrote:
 And there are lots of lower-priority things that are being done.  And
 lots of dollars sitting on the sidelines doing nothing.

 Low-priority interesting things tend to get done when you have volunteers
 doing them.  While the value of some of the Foundation's expenditure is
 commonly debated, I think you'd struggle to argue that many of the WMF's
 dollars are not doing *anything*.

Last I checked there were millions of them sitting in the bank.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-18 Thread Aryeh Gregor
On Mon, Jan 17, 2011 at 9:12 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 Wikimedia doesn't technically use delta compression. It concatenates a
 couple dozen adjacent revisions of the same page and compresses that
 (with gzip?), achieving very good compression ratios because there is
 a huge amount of duplication in, say, 20 adjacent revisions of
 [[Barack Obama]] (small changes to a large page, probably a few
 identical versions to due vandalism reverts, etc.).

We used to do this, but the problem was that many articles are much
larger than the compression window of typical compression algorithms,
so the redundancy between adjacent revisions wasn't helping
compression except for short articles.  Tim wrote a diff-based history
storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and
deployed it on Wikimedia, for 93% space savings:

http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html

I don't know if this was ever deployed to all of external storage,
though.  In that thread Tim mentioned only recompressing about 40% of
revisions, and said that the recompression script required care and
human attention to work correctly, so maybe he never got around to
recompressing all the rest -- I don't think he ever said, that I saw.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-18 Thread Roan Kattouw
2011/1/19 Aryeh Gregor simetrical+wikil...@gmail.com:
 We used to do this, but the problem was that many articles are much
 larger than the compression window of typical compression algorithms,
 so the redundancy between adjacent revisions wasn't helping
 compression except for short articles.  Tim wrote a diff-based history
 storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and
 deployed it on Wikimedia, for 93% space savings:

 http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html

That's right, I forgot about that.

 I don't know if this was ever deployed to all of external storage,
 though.  In that thread Tim mentioned only recompressing about 40% of
 revisions, and said that the recompression script required care and
 human attention to work correctly, so maybe he never got around to
 recompressing all the rest -- I don't think he ever said, that I saw.

I think he finished recompressing a couple of months ago.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-18 Thread Alex Brollo
It seems a complely different topic, but: is there something to learn about
text saving from the smart trick of TeX formulas storing? I did a little bit
of reverse engineering on that algorithm, I did never find anything useful
application from it, but much fun. :-)

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Anthony
On Sun, Jan 16, 2011 at 7:34 PM, Lars Aronsson l...@aronsson.se wrote:
 Many articles are soo long, and have been edited so many
 times, that the history view is almost useless. If I want
 to find out when and how the sentence Overall, the city
 is relatively flatin the article [[en:Paris]] has changed
 over time, I can sit all day and analyze individual diffs.

 I think it would be very useful if I could highlight a
 sentence, paragraph or section of an article and get a
 reduced history view with only those edits that changed
 that part of the page. What sorts of indexes would be needed
 to facilitate such a search? Has anybody already implemented
 this as a separate tool?

How would you define a particular sentence, paragraph or section of an
article?  The difficulty of the solution lies in answering that
question.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Aryeh Gregor
On Mon, Jan 17, 2011 at 5:55 AM, Alex Brollo alex.bro...@gmail.com wrote:
 Before I dig a little more into wiki mysteries, I was absolutely sure that
 wiki articles were stored into small pieces (paragraphs?) so that a small
 edit into a long long page would take exactly the same disk space than a
 small edit into a short page. But I discovered soon, that things are
 different. :-)

Wikimedia stores diffs using delta compression, so actually this is
basically what happens.  The size of the edit is what determines the
size of the stored diff, not the size of the page.  (I don't know how
this works in detail, though.)  IIRC, default MediaWiki doesn't work
this way.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Anthony
On Mon, Jan 17, 2011 at 10:40 AM, Alex Brollo alex.bro...@gmail.com wrote:
 2011/1/17 Bryan Tong Minh bryan.tongm...@gmail.com


 Difficult, but doable. Jan-Paul's sentence-level editing tool is able
 to make the distinction. It would perhaps be possible to use that as a
 framework for sentence-level diffs.


 Difficult, but diff between versions of a page does it. Looking at diff
 between pages, I simply thought firmly that only diff paragraphs were
 stored, so that the page was built as updated diff segments. I had no idea
 how this could be done, but  all was magic!

Paragraphs are much easier to recognize than sentences, as wikitext
has a paragraph delimiter - a blank line.  To truly recognize
sentences, you basically have to engage in natural language
processing, though you can probably get it right 90% of the time
without too much effort.

And to recognize what's going on when a sentence changes *and* is
moved from one paragraph to another, requires an even greater level of
natural language understanding.  Again though, you can probably get it
right most of the time without too much effort.

Wikitext actually makes it easier for the most part, as you can use
tricks such as the fact that the periods in [[I.M. Someone]] don't
represent sentence delimiters, since they are contained in square
brackets.  But not all periods which occur in the middle of a sentence
are contained in square brackets, and not all sentences end with a
period.

I'd say difficult but doable is quite accurate, although with the
caveat that even the state of the art tools available today are
probably going to make mistakes that would be obvious to a human.  I'm
sure there are tools for this, and there are probably some decent ones
that are open source.  But it's not as simple as just adding an index.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Anthony
On Mon, Jan 17, 2011 at 12:41 PM, Anthony wikim...@inbox.org wrote:
 And to recognize what's going on when a sentence changes *and* is
 moved from one paragraph to another, requires an even greater level of
 natural language understanding.  Again though, you can probably get it
 right most of the time without too much effort.

Or at the paragraph level, when two paragraphs are combined into one
(vs. one paragraph being deleted), or one paragraph is split into two
(vs. one paragraph being added), or any of the various other, more
complicated changes that take place.

If you want a high level of accuracy when trying to determine who
added a particular fact (such as Overall, the city is relatively
flat, which may have started out as Paris, in general, contains very
few changes in elevation), you really need to combine automated tools
with human understanding.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Roan Kattouw
2011/1/17 Aryeh Gregor simetrical+wikil...@gmail.com:
 Wikimedia stores diffs using delta compression, so actually this is
 basically what happens.  The size of the edit is what determines the
 size of the stored diff, not the size of the page.  (I don't know how
 this works in detail, though.)  IIRC, default MediaWiki doesn't work
 this way.

Wikimedia doesn't technically use delta compression. It concatenates a
couple dozen adjacent revisions of the same page and compresses that
(with gzip?), achieving very good compression ratios because there is
a huge amount of duplication in, say, 20 adjacent revisions of
[[Barack Obama]] (small changes to a large page, probably a few
identical versions to due vandalism reverts, etc.). However,
decompressing it just gets you the raw text, so nothing in this
storage system helps generation of diffs. Diff generation is still
done by shelling out to wikidiff2 (a custom C++ diff implementation
that generates diffs with HTML markup like ins/del) and caching
the result in memcached.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Lars Aronsson
On 01/17/2011 06:50 PM, Anthony wrote:
 If you want a high level of accuracy when trying to determine who
 added a particular fact (such as Overall, the city is relatively
 flat, which may have started out as Paris, in general, contains very
 few changes in elevation), you really need to combine automated tools
 with human understanding.

Our current diff is not perfect, it often performs worse
than the GNU wdiff (word diff) utility. But it is still useful.
What I'm calling for is a way to filter out (or group together)
some of the edits from the history view that had nothing at all
to do with the specified sentence or paragraph. This shouldn't
be impossible to do. It need not be perfect. The more
irrelevant edits it can filter out, the better.

I'm a Unix programmer from the days of RCS, which is
functionally equivalent to the version control in MediaWiki.
In RCS,tracing when, how and by whom a particular piece of
code was altered (i.e., who introduced that bug) is as hard
as it now is in MediaWiki.Do any of the newer systems (SVN,
Git, ...) or commercial integrated development environments
have better support for this?


-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Lars Aronsson
On 01/17/2011 03:49 PM, Anthony wrote:
 How would you define a particular sentence, paragraph or section of an
 article?  The difficulty of the solution lies in answering that
 question.

I think the definition could vary, and the functionality could
still be useful. The API parameters could be the offset and
length in the given article version, just like substr().

A user interface (depending on skin) could input the offset
and length by point-and-click (region select) or by pointing
at a word and finding the preceding and following blank line.
Some user interface might care about sentence separators.

The search could be simplified if each edit preserved some
parameters of the diff, an edit index, e.g. inserted 7
characters at offset 4711. Then we know that this edit is
irrelevant if the sought offset is nowhere near 4711 and
as we go back in history, our offset needs to be reduced
by 7 if it is larger than 4711. Doing such offset arithmetics
for a thousand article edits should be a lot faster than
calling diff over and over again. And then again, the diffs
are necessary to build such an edit index. This could be
done in a one-time conversion or on demand, using the edit
index as a cache of such parameters.


-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread masti
what is the reason and what it can bring to the community?

masti

On 01/17/2011 01:34 AM, Lars Aronsson wrote:

 I think it would be very useful if I could highlight a
 sentence, paragraph or section of an article and get a
 reduced history view with only those edits that changed
 that part of the page. What sorts of indexes would be needed
 to facilitate such a search? Has anybody already implemented
 this as a separate tool?





___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread Lars Aronsson
On 01/17/2011 11:36 PM, masti wrote:
 what is the reason and what it can bring to the community?

I tried to describe this. The task of finding out the
history of a part of an article is very time consuming
for long articles with a long history, where you have
to manually look through lots of revisions that aren't
related to the part of the article you are interested in.

I took as the example the part of the flat geography
of the city of Paris. Was this part controversial? Who
edited it? Has it changed? When and by whom?

Most edits to the article Paris are probably related to
new elections, new buildings, new institutions. Most
edits have nothing to do with the flat geography.
So could the history view of maybe 5000 edits
be quickly reduced down to 50 edits or even 5?


-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-17 Thread masti
On 01/18/2011 12:30 AM, Lars Aronsson wrote:
 On 01/17/2011 11:36 PM, masti wrote:
 what is the reason and what it can bring to the community?

 I tried to describe this. The task of finding out the
 history of a part of an article is very time consuming
 for long articles with a long history, where you have
 to manually look through lots of revisions that aren't
 related to the part of the article you are interested in.

 I took as the example the part of the flat geography
 of the city of Paris. Was this part controversial? Who
 edited it? Has it changed? When and by whom?

 Most edits to the article Paris are probably related to
 new elections, new buildings, new institutions. Most
 edits have nothing to do with the flat geography.
 So could the history view of maybe 5000 edits
 be quickly reduced down to 50 edits or even 5?


In this rare situation it could be beneficial, but does it really make 
sense in general? Workload and complication of interface, in my opinion, 
is not worth it.


masti

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] From page history to sentence history

2011-01-16 Thread Lars Aronsson
In the early days, one could follow Recent changes on
a daily basis, to see if anything had changed. Nowadays
watchlists reduce the amount of information to those
pages one is interested in.

Many articles are soo long, and have been edited so many
times, that the history view is almost useless. If I want
to find out when and how the sentence Overall, the city
is relatively flatin the article [[en:Paris]] has changed
over time, I can sit all day and analyze individual diffs.

I think it would be very useful if I could highlight a
sentence, paragraph or section of an article and get a
reduced history view with only those edits that changed
that part of the page. What sorts of indexes would be needed
to facilitate such a search? Has anybody already implemented
this as a separate tool?



-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-16 Thread Benjamin Lees
On Sun, Jan 16, 2011 at 7:34 PM, Lars Aronsson l...@aronsson.se wrote:
 I think it would be very useful if I could highlight a
 sentence, paragraph or section of an article and get a
 reduced history view with only those edits that changed
 that part of the page. What sorts of indexes would be needed
 to facilitate such a search? Has anybody already implemented
 this as a separate tool?

The New York Times has made some progress in this area.[0] Of course,
their articles don't get edited the way Wikipedia ones do...

I've tried using WikiBlame[1] a few times, but it operates at the
level of strings, rather than sections/paragraphs/sentences, so, like
you, I'm left to do most of my digging by hand.  Glad to know I'm not
alone in my pain. :-)

[0] http://open.blogs.nytimes.com/2011/01/11/emphasis-update-and-source/
[1] http://en.wikipedia.org/wiki/User:Flominator/WikiBlame

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l