Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-03-05 Thread Quim Gil
James Salman's proposal for Google Summer of Code is being discussed at
https://phabricator.wikimedia.org/T89416 and I think we need more feedback
from more people. I have summarized in
https://phabricator.wikimedia.org/T89416#1087673 my opinion against
including this project until receiving community backing. James' arguments
also must be considered though, and I think we are in a situation where
more voices are welcome.


On Fri, Feb 13, 2015 at 7:31 PM, Risker risker...@gmail.com wrote:


 While I have no doubt that the WMF and the Wikimedia community care about
 the accuracy of articles, there's no basis to believe that this project
 would have any effect on said accuracy, or that it will actually identify
 inaccuracies in the text; it looks for old edits that haven't been
 revised using keywords that may or may not have any relevance to the
 accuracy of the information. It would require tens of thousands of
 person-hours (if not more) to analyse the data, and not a single article
 would be improved. Your proposal requires massive time commitment from
 reviewers of the data obtained in order to assess whether or not an update
 should be requested; it doesn't even fix out-of-date information. There is
 no indication at all that there is any interest on the part of Wikipedians
 to review data identified in the manner you propose.

 Risker/Anne


 On 13 February 2015 at 12:58, James Salsman jsals...@gmail.com wrote:

  Risker wrote:
  
  ... relying on suggestions from a six-year-old strategy document
   when we're about to start a new strategic session, isn't the best
   course of action.
 
  A strategy proposal which never garnered criticism after so many
  opportunities would seem to qualify as at least an emergent strategy
  within the meaning of the slide and narrative at
  https://www.youtube.com/watch?v=N4Kvj5vCaW0t=19m30s
 
  Furthermore, the initial limited subtask would be much more difficult
  to evaluate as a strategy without a working prototype, including by
  the Bot Approvals Group which demands working code before making a
  final decision on implementation. Trying to second guess the BAG is
  presumptuous.
 
  Is it possible that supporting updates to out of date articles would
  not be part of any successful strategy for the Foundation? I have
  posted multiple series of statistics to wiki-research-l in the past
  several months proving that quality issues are transitioning from
  creating new content to maintaining old content, and will be happy to
  recapitulate them should anyone suggest that they think it could be.
 
   what exactly is the plan for doing something with this information.
 
  It will be made available to volunteers as a backlog list which
  community members may or may not choose to work on. The Foundation
  can't prescribe mandatory content improvement work without putting the
  safe harbor provisions in jeopardy. Volunteers will be attracted to
  working on such updates in proportion to the extent they see them as
  being a worthy use of their editing time.
 
  I have additional detailed plans for testing which I will be happy to
  discuss with interested co-mentors, because depending on available
  resources there could be a way to eliminate substantial duplication of
  effort.
 
  I have updated the synopses at
  https://www.mediawiki.org/wiki/Accuracy_review
  and https://phabricator.wikimedia.org/T89416
 
  Best regards,
  James Salsman
 
   I invite review of this preliminary proposal for a Google Summer of
   Code project:
http://www.mediawiki.org/wiki/Accuracy_review
  
   If you would like to co-mentor this project, please sign up. I've been
   a GSoC mentor every year since 2010, and successfully mentored two
   students in 2012 resulting in work which has become academically
   relevant, including in languages which I can not read, i.e.,
   http://talknicer.com/turkish-tablet.pdf .) I am most interested in
   co-mentors at the WMF or Wiki Education Foundation involved with
   engineering, design, or education.
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Quim Gil
Engineering Community Manager @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-13 Thread Risker
I'd suggest, James, that relying on suggestions from a
six-year-old strategy document when we're about to start a new strategic
session, isn't the best course of action.

I'd also query what exactly is the plan for doing something with this
information.  Collecting lists of things that might no longer be up to date
when there is no correlating action plan for updating the same information
is probably not good use of anyone's time or effort.

Risker/Anne

On 13 February 2015 at 07:57, James Salsman jsals...@gmail.com wrote:

 Brian Wolff wrote:

  Have you run this by Wikipedians? ... since it involves adding a
  bunch of templates

 The 2009 strategy proposal linked towards the end, of which the
 GSoC proposal is a limited subset, did not get any criticism after several
 high-profile opportunities. Both proposals are intended to be language- and
 keyword-neutral. It would be best if the initial bot were built and tested
 on some other wiki than the English Wikipedia, so the Bot Approvals Group
 will be able to get concrete answers to any questions they might have.

 The use of templates should be optional; one way to do that would be to
 allow the use of a mirroring namespace to hold the templates, instead of
 the primary namespace. But there is probably a better way. Thank you for
 something so interesting to think about.

   *Prepare a table of each word in article dumps indicating its age. *
 * *
 * *This in itself is a non-trivial problem (for a gsoc student anyways),

 It would be non-trivial for a large production dump, but for a small
 subset of articles in a given dump, there are deterministic algorithms
 which perform with sufficient accuracy to form the specified partial basis
 of a selection heuristic. Creating such a table is at worst O(N) in
 revisions, but there are ways to hash words with N-gram contexts so that
 moved and blanked text is more likely to be treated correctly than what raw
 diffs would lead people to believe might happen. This is equivalent to the
 general blame problem, and I look forward to explaining the history of the
 problem (see e.g., http://wikitrust.soe.ucsc.edu/talks-and-papers ) to
 show
 why the N-gram hash solution is best.

  *Convert flagged passages to GIFT questions for review and*
 * present them to one or more subscribed reviewers *
 
  Wouldn't you want to give the reviewers an actual form where
  they can fill out the questions?

 Yes, and I want to store questions in GIFT format to allow follow-on
 integration
 with the Global Learning Xprize Meta-Team deliverables. Presentation of a
 GIFT question means converting it to a form instead of just displaying it
 in markup. The question pertaining to whether direct integration is a
 reasonable follow-on goal depends on the extent to which branching scenario
 interactive fiction role-play content, such as shown in
 http://www.capuano.biz/papers/EL_2014.pdf
 and http://talknicer.com/GLMORS_2014.pdf
 can be automatically created. I think it can be, and look forward to
 discussing the matter in detail with co-mentor volunteers.
 http://talknicer.com/GLMORS_2014.pdf

 On Thursday, February 12, 2015, James Salsman jsals...@gmail.com wrote:

  I invite review of this preliminary proposal for a Google Summer of
  Code project:
   http://www.mediawiki.org/wiki/Accuracy_review
 
  If you would like to co-mentor this project, please sign up. I've been
  a GSoC mentor every year since 2010, and successfully mentored two
  students in 2012 resulting in work which has become academically
  relevant, including in languages which I can not read, i.e.,
  http://talknicer.com/turkish-tablet.pdf .) I am most interested in
  co-mentors at the WMF or Wiki Education Foundation involved with
  engineering, design, or education.
 
  Synopsis:
 
  Create a Pywikibot to find articles in given categories, category
  trees, and lists. For each such article, add in-line templates to
  indicate the location of passages with (1) facts and statistics which
  are likely to have become out of date and have not been updated in a
  given number of years, and (2) phrases which are likely unclear. Use a
  customizable set of keywords and the DELPH-IN LOGIN parser
  [http://erg.delph-in.net/logon] to find such passages for review.
  Prepare a table of each word in article dumps indicating its age.
  Convert flagged passages to GIFT questions
  [http://microformats.org/wiki/gift] for review and present them to one
  or more subscribed reviewers. Update the source template with the
  reviewer(s)' answers to the GIFT question, but keep the original text
  as part of the template. When reviewers disagree, update the template
  to reflect that fact, and present the question to a third reviewer to
  break the tie.
 
  Possible stretch goals for Global Learning Xprize Meta-Team systems
  [http://www.wiki.xprize.org/Meta-team#Goals] integration TBD.
 
  Best regards,
  James Salsman
 ___
 Wikitech-l mailing 

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-13 Thread James Salsman
Risker wrote:

... relying on suggestions from a six-year-old strategy document
 when we're about to start a new strategic session, isn't the best
 course of action.

A strategy proposal which never garnered criticism after so many
opportunities would seem to qualify as at least an emergent strategy
within the meaning of the slide and narrative at
https://www.youtube.com/watch?v=N4Kvj5vCaW0t=19m30s

Furthermore, the initial limited subtask would be much more difficult
to evaluate as a strategy without a working prototype, including by
the Bot Approvals Group which demands working code before making a
final decision on implementation. Trying to second guess the BAG is
presumptuous.

Is it possible that supporting updates to out of date articles would
not be part of any successful strategy for the Foundation? I have
posted multiple series of statistics to wiki-research-l in the past
several months proving that quality issues are transitioning from
creating new content to maintaining old content, and will be happy to
recapitulate them should anyone suggest that they think it could be.

 what exactly is the plan for doing something with this information.

It will be made available to volunteers as a backlog list which
community members may or may not choose to work on. The Foundation
can't prescribe mandatory content improvement work without putting the
safe harbor provisions in jeopardy. Volunteers will be attracted to
working on such updates in proportion to the extent they see them as
being a worthy use of their editing time.

I have additional detailed plans for testing which I will be happy to
discuss with interested co-mentors, because depending on available
resources there could be a way to eliminate substantial duplication of
effort.

I have updated the synopses at https://www.mediawiki.org/wiki/Accuracy_review
and https://phabricator.wikimedia.org/T89416

Best regards,
James Salsman

 I invite review of this preliminary proposal for a Google Summer of
 Code project:
  http://www.mediawiki.org/wiki/Accuracy_review

 If you would like to co-mentor this project, please sign up. I've been
 a GSoC mentor every year since 2010, and successfully mentored two
 students in 2012 resulting in work which has become academically
 relevant, including in languages which I can not read, i.e.,
 http://talknicer.com/turkish-tablet.pdf .) I am most interested in
 co-mentors at the WMF or Wiki Education Foundation involved with
 engineering, design, or education.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-13 Thread Risker
James, it received a single support vote.  It was not included in any
final strategy documents.  I think it could more accurately be described as
something not worthy of the attention of the Wikimedia community.

While I have no doubt that the WMF and the Wikimedia community care about
the accuracy of articles, there's no basis to believe that this project
would have any effect on said accuracy, or that it will actually identify
inaccuracies in the text; it looks for old edits that haven't been
revised using keywords that may or may not have any relevance to the
accuracy of the information. It would require tens of thousands of
person-hours (if not more) to analyse the data, and not a single article
would be improved. Your proposal requires massive time commitment from
reviewers of the data obtained in order to assess whether or not an update
should be requested; it doesn't even fix out-of-date information. There is
no indication at all that there is any interest on the part of Wikipedians
to review data identified in the manner you propose.

Risker/Anne


On 13 February 2015 at 12:58, James Salsman jsals...@gmail.com wrote:

 Risker wrote:
 
 ... relying on suggestions from a six-year-old strategy document
  when we're about to start a new strategic session, isn't the best
  course of action.

 A strategy proposal which never garnered criticism after so many
 opportunities would seem to qualify as at least an emergent strategy
 within the meaning of the slide and narrative at
 https://www.youtube.com/watch?v=N4Kvj5vCaW0t=19m30s

 Furthermore, the initial limited subtask would be much more difficult
 to evaluate as a strategy without a working prototype, including by
 the Bot Approvals Group which demands working code before making a
 final decision on implementation. Trying to second guess the BAG is
 presumptuous.

 Is it possible that supporting updates to out of date articles would
 not be part of any successful strategy for the Foundation? I have
 posted multiple series of statistics to wiki-research-l in the past
 several months proving that quality issues are transitioning from
 creating new content to maintaining old content, and will be happy to
 recapitulate them should anyone suggest that they think it could be.

  what exactly is the plan for doing something with this information.

 It will be made available to volunteers as a backlog list which
 community members may or may not choose to work on. The Foundation
 can't prescribe mandatory content improvement work without putting the
 safe harbor provisions in jeopardy. Volunteers will be attracted to
 working on such updates in proportion to the extent they see them as
 being a worthy use of their editing time.

 I have additional detailed plans for testing which I will be happy to
 discuss with interested co-mentors, because depending on available
 resources there could be a way to eliminate substantial duplication of
 effort.

 I have updated the synopses at
 https://www.mediawiki.org/wiki/Accuracy_review
 and https://phabricator.wikimedia.org/T89416

 Best regards,
 James Salsman

  I invite review of this preliminary proposal for a Google Summer of
  Code project:
   http://www.mediawiki.org/wiki/Accuracy_review
 
  If you would like to co-mentor this project, please sign up. I've been
  a GSoC mentor every year since 2010, and successfully mentored two
  students in 2012 resulting in work which has become academically
  relevant, including in languages which I can not read, i.e.,
  http://talknicer.com/turkish-tablet.pdf .) I am most interested in
  co-mentors at the WMF or Wiki Education Foundation involved with
  engineering, design, or education.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-13 Thread James Salsman
Risker wrote:

... it received a single support vote

There are two supporters including myself who indicated they are
willing to work on it, and it also recieved support at
https://strategy.wikimedia.org/wiki/Favorites/Lodewijk

Many of the implemented proposals received less formal process
support, for example:
https://strategy.wikimedia.org/wiki/Proposal:Foundation-Announce-l
https://strategy.wikimedia.org/wiki/Proposal:Create_Wikisource_for_Yiddish
https://strategy.wikimedia.org/wiki/Proposal:Allow_IPs_to_edit_sections_on_English_Wikipedia_(done)
https://strategy.wikimedia.org/wiki/Proposal_talk:Implement_secret_ballots_(Done)
https://strategy.wikimedia.org/wiki/Proposal:IPhone/iPod_Touch_Offical_Wikipedia_App_(Done)
https://strategy.wikimedia.org/wiki/Proposal:Mobiltelefonversion_von_Wikipedia_(Done)
... and at least four more just that I have looked through so far.

Moreover, according to the vote scoring system, I believe it ranked in
the top 8% out of several hundred proposals, although that information
is apparently no longer available.

... there's no basis to believe that this ... will actually identify
 inaccuracies in the text

Do you believe that if you find an article about a geographic region
with the words population 1,234,567 or gross national product
within the same grammatical clause as a number, and you know that text
was inserted 10 years ago, that you have not found a likely
out-of-date inaccuracy? What reason could there possibly be to believe
otherwise?

... It would require tens of thousands of person-hours (if not more) to
 analyse the data, and not a single article would be improved.

On the contrary, we can try it on 100 randomly selected vital
articles, and if we don't have enough data to make an extrapolation
with useful confidence intervals, we can try it on a slightly larger
sample of them. This is something the GSoC students can do themselves,
without and volunteer support. But what reason is there to believe
that such support won't be forthcoming if requested from the
copyeditor's guild or similar wikiproject, for example?

... Your proposal requires massive time commitment from reviewers

Why would it require any more time commitment than the existing 17,200
articles in [[Category:Wikipedia articles needing factual
verification]]? Where is the requirement? Volunteer editors are free
to spend their time in the manner which they believe will best serve
improvements.

... it doesn't even fix out-of-date information.

Do you think actual fact checking should be done by people or bots?

... There is no indication at all that there is any interest on the part
 of Wikipedians to review data identified in the manner you propose.

Most of the WP:BACKLOG categories have articles entering and exiting
them every day. What reason is to believe that articles selected by an
automated accuracy review process would be any different?

... there's no basis to believe that this project would have any
 effect on accuracy

Even if you had airtight evidence that was incontrovertibly true (and
for the reasons above, there can obviously be no such evidence)
wouldn't it still be the case that there would only be one way to find
out?

Best regards,
James Salsman


On Fri, Feb 13, 2015 at 10:58 AM, James Salsman jsals...@gmail.com wrote:
 Risker wrote:

... relying on suggestions from a six-year-old strategy document
 when we're about to start a new strategic session, isn't the best
 course of action.

 A strategy proposal which never garnered criticism after so many
 opportunities would seem to qualify as at least an emergent strategy
 within the meaning of the slide and narrative at
 https://www.youtube.com/watch?v=N4Kvj5vCaW0t=19m30s

 Furthermore, the initial limited subtask would be much more difficult
 to evaluate as a strategy without a working prototype, including by
 the Bot Approvals Group which demands working code before making a
 final decision on implementation. Trying to second guess the BAG is
 presumptuous.

 Is it possible that supporting updates to out of date articles would
 not be part of any successful strategy for the Foundation? I have
 posted multiple series of statistics to wiki-research-l in the past
 several months proving that quality issues are transitioning from
 creating new content to maintaining old content, and will be happy to
 recapitulate them should anyone suggest that they think it could be.

 what exactly is the plan for doing something with this information.

 It will be made available to volunteers as a backlog list which
 community members may or may not choose to work on. The Foundation
 can't prescribe mandatory content improvement work without putting the
 safe harbor provisions in jeopardy. Volunteers will be attracted to
 working on such updates in proportion to the extent they see them as
 being a worthy use of their editing time.

 I have additional detailed plans for testing which I will be happy to
 discuss with interested co-mentors, because 

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-13 Thread Brian Wolff

 Furthermore, the initial limited subtask would be much more difficult
 to evaluate as a strategy without a working prototype, including by
 the Bot Approvals Group which demands working code before making a
 final decision on implementation. Trying to second guess the BAG is
 presumptuous.
es

Im not saying you need formal approval from BAG approval before you begin.
Im saying you should have an informal discussion on VP or somewhere to make
sure that relavent stakeholders think the idea would potentially be useful
in principle.

Six years is a long time, people change, things change, especially for a
proposal that while didnt garner a lot of opposisition, didnt garner people
jumping up and down in support either.

--bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-12 Thread James Salsman
I invite review of this preliminary proposal for a Google Summer of
Code project:
 http://www.mediawiki.org/wiki/Accuracy_review

If you would like to co-mentor this project, please sign up. I've been
a GSoC mentor every year since 2010, and successfully mentored two
students in 2012 resulting in work which has become academically
relevant, including in languages which I can not read, i.e.,
http://talknicer.com/turkish-tablet.pdf .) I am most interested in
co-mentors at the WMF or Wiki Education Foundation involved with
engineering, design, or education.

Synopsis:

Create a Pywikibot to find articles in given categories, category
trees, and lists. For each such article, add in-line templates to
indicate the location of passages with (1) facts and statistics which
are likely to have become out of date and have not been updated in a
given number of years, and (2) phrases which are likely unclear. Use a
customizable set of keywords and the DELPH-IN LOGIN parser
[http://erg.delph-in.net/logon] to find such passages for review.
Prepare a table of each word in article dumps indicating its age.
Convert flagged passages to GIFT questions
[http://microformats.org/wiki/gift] for review and present them to one
or more subscribed reviewers. Update the source template with the
reviewer(s)' answers to the GIFT question, but keep the original text
as part of the template. When reviewers disagree, update the template
to reflect that fact, and present the question to a third reviewer to
break the tie.

Possible stretch goals for Global Learning Xprize Meta-Team systems
[http://www.wiki.xprize.org/Meta-team#Goals] integration TBD.

Best regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-12 Thread Brian Wolff
On 2/12/15, James Salsman jsals...@gmail.com wrote:
 I invite review of this preliminary proposal for a Google Summer of
 Code project:
  http://www.mediawiki.org/wiki/Accuracy_review

 If you would like to co-mentor this project, please sign up. I've been
 a GSoC mentor every year since 2010, and successfully mentored two
 students in 2012 resulting in work which has become academically
 relevant, including in languages which I can not read, i.e.,
 http://talknicer.com/turkish-tablet.pdf .) I am most interested in
 co-mentors at the WMF or Wiki Education Foundation involved with
 engineering, design, or education.

 Synopsis:

 Create a Pywikibot to find articles in given categories, category
 trees, and lists. For each such article, add in-line templates to
 indicate the location of passages with (1) facts and statistics which
 are likely to have become out of date and have not been updated in a
 given number of years, and (2) phrases which are likely unclear. Use a
 customizable set of keywords and the DELPH-IN LOGIN parser
 [http://erg.delph-in.net/logon] to find such passages for review.
 Prepare a table of each word in article dumps indicating its age.
 Convert flagged passages to GIFT questions
 [http://microformats.org/wiki/gift] for review and present them to one
 or more subscribed reviewers. Update the source template with the
 reviewer(s)' answers to the GIFT question, but keep the original text
 as part of the template. When reviewers disagree, update the template
 to reflect that fact, and present the question to a third reviewer to
 break the tie.

 Possible stretch goals for Global Learning Xprize Meta-Team systems
 [http://www.wiki.xprize.org/Meta-team#Goals] integration TBD.

 Best regards,
 James Salsman

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Have you run this by Wikipedians? (I'm assuming enwikipedia would be
your target audience). I would recommend making sure that enwikipedia
is politically ok with this first, since it involves adding a bunch of
templates to articles, as it would suck for a gsoc student if their
work wasn't used due to politics happening at the end.

Prepare a table of each word in article dumps indicating its age.

This in itself is a non-trivial problem (for a gsoc student anyways),
assuming you need it for the entire enwikipedia, and you need it up to
date as soon as people edit. Even getting the student sufficient
storage and CPU resources to actually compute that could potentially
be difficult (maybe?)

Convert flagged passages to GIFT questions for review and present them to one 
or more subscribed reviewers

Wouldn't you want to give the reviewers an actual form where they can
fill out the questions, not something in a markup language (Unless you
mean you want them to store it in that form internally,which seems
like a rather minor implementation detail)

--bawolff

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l