Hoi,
How many characters are there according to your software in the word Mbɔ́tɛ
? The correct answer is 5
Thanks,
GerardM
2009/1/10 Greg Hewgill <[email protected]>
> 2009/1/8 Brion Vibber <[email protected]>:
> > Definitely of interest! If you haven't already, I'd love to see some
> > documentation on the format on mediawiki.org, and it'd be great if we
>
> I did some similar work a while ago using Python's difflib[1] as the
> diffing engine. Since difflib was much too slow when feeding it lists
> of single characters, I broke up the articles into sequences of words
> which improved the speed dramatically (but it's still not as fast as
> Robert's).
>
> My goal was slightly different, and rather than producing exact
> revision deltas I was looking for "blame" information[2]. I also
> computed the SHA1-matching graph of reverts, which calculates the
> shortest path between the current revision and the first one,
> consequently skipping over page-blanking events in most cases.
>
> The output for the first 1400 or so articles in enwiki can be found
> here:
> http://hewgill.com/~greg/wikiblame/<http://hewgill.com/%7Egreg/wikiblame/>
>
> I would be interested in adapting my blame processor to use a faster
> diffing algorithm, since it took my machine many hours to process
> those 1400 articles.
>
> [1]: http://python.org/doc/2.5/lib/module-difflib.html
> [2]: http://hewgill.com/journal/entries/461-wikipedia-blame
>
> Greg Hewgill
> http://hewgill.com
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l