Hi, you can also look at our WikiWho code, we have tested it to extract the
changes between revisions considerably faster than a simple diff. see here:
https://github.com/maribelacosta/wikiwho . you would have to adapt the code a
bit to give you the pure diffs though. let me know if you need
The deltas library implements the rough WikiWho strategy in a difflib sort
of way as "SegmentMatcher".
Re. diffs, I have some datasets that I have generated and can share. Would
enwiki-20150602 be recent enough for your uses?
If not, then I'd also like to point you to
Thanks for all the suggestions you shared!
@ Aaron, it would be great if you can share me the dataset you have. I
think 20150602 is fairly new. In the meanwhile, I will explore the
utilities you mentioned. Think they are good stuff to learn and practice.
Thanks!
On Wed, Jan 20, 2016 at 9:20 AM,
Hello all,
I am a 2nd PhD student working in Grouplens Research group at the
University of Minnesota - Twin Cities. Recently, I am working on a project
to study how identity based and bond based theories would help understand
editor's behavior in WikiProjects within the group context, but I am
Hi Bowen,
You might compare the performance of Aaron Halfaker's deltas library:
https://github.com/halfak/deltas
(You might have already done so, I guess, but just in case)
In either case, I suspect the tasks will need to be parallelized to be
achieved in a reasonable time scale. How many