from:"James Salsman"

Re: [Wikitech-l] Tech Talk livestream chat transcript (was Re: [Wikimedia Technical Talks] Data and Decision Science at Wikimedia with Kate Zimmerman, 26 February 2020 @ 6PM UTC)

2020-02-26 Thread James Salsman

I note now that the full chat transcript has been restored; thank you.

I am still interested in the answer to the question.

Sincerely,
Jim

On Wed, Feb 26, 2020 at 10:46 AM James Salsman  wrote:
>
> Just now I asked the following question on the Technical Talk
> livestream at https://www.youtube.com/watch?v=J-CRsiwYM9w
>
> 10:19 AM: Page 20 of Robert West's 2016 Stanford thesis, "Human
> Navigation of Information Networks" says, "We have access to
> Wikimedia’s full server logs, containing all HTTP requests to
> Wikimedia projects."
> 10:19 AM: The text is at
> http://infolab.stanford.edu/~west1/pubs/West_Dissertation-2016.pdf
> 10:19 AM: Page 19 indicates that this information includes the "IP
> address, proxy information, and user agent."
> 10:20 AM: This is confirmed by West at
> https://www.youtube.com/watch?v=jQ0NPhT-fsE=25m40s
> 10:20 AM: Does the Foundation still share that identifying information
> with research affiliates? If so, how many are them world-wide; if not,
> when did sharing this information stop?
> 10:25 AM: MediaWiki @James Salsman I see your question, but donfly.
> Can we reach out to you after the talk?
> 10:25 AM: MediaWiki: sorry hit enter too soon!
> 10:26 AM: MediaWiki: I don't have the full-context of the thesis to
> ask kate to answer the question on the fly. Can we reach out to you
> after?
> 10:26 AM: James Salsman: With whom am I corresponding?
> 10:27 AM: James Salsman: Sarah?
> 10:28 AM: MediaWiki: Yes! That's me!
> 10:28 AM: James Salsman: Would it be easier to ask, "how many research
> affiliates does the Foundation share server logs with IP addresses?"
> 10:29 AM: MediaWiki: Yes, I can ask that.
> 10:30 AM: James Salsman: Thank you.
>
> At 10:34, the messages with the the URLs I posted were removed from
> the chat log.
>
> At 10:36, the chat stream was removed from the video, which was
> replaced with the text, "Chat is disabled for this live stream," and
> now is completely missing.
>
> I am still interested in getting an answer to this question, but
> disturbed by the removal of links to sources. Could I please have an
> explanation?
>
> Sincerely,
> Jim

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Tech Talk livestream chat transcript (was Re: [Wikimedia Technical Talks] Data and Decision Science at Wikimedia with Kate Zimmerman, 26 February 2020 @ 6PM UTC)

2020-02-26 Thread James Salsman

Just now I asked the following question on the Technical Talk
livestream at https://www.youtube.com/watch?v=J-CRsiwYM9w

10:19 AM: Page 20 of Robert West's 2016 Stanford thesis, "Human
Navigation of Information Networks" says, "We have access to
Wikimedia’s full server logs, containing all HTTP requests to
Wikimedia projects."
10:19 AM: The text is at
http://infolab.stanford.edu/~west1/pubs/West_Dissertation-2016.pdf
10:19 AM: Page 19 indicates that this information includes the "IP
address, proxy information, and user agent."
10:20 AM: This is confirmed by West at
https://www.youtube.com/watch?v=jQ0NPhT-fsE=25m40s
10:20 AM: Does the Foundation still share that identifying information
with research affiliates? If so, how many are them world-wide; if not,
when did sharing this information stop?
10:25 AM: MediaWiki @James Salsman I see your question, but donfly.
Can we reach out to you after the talk?
10:25 AM: MediaWiki: sorry hit enter too soon!
10:26 AM: MediaWiki: I don't have the full-context of the thesis to
ask kate to answer the question on the fly. Can we reach out to you
after?
10:26 AM: James Salsman: With whom am I corresponding?
10:27 AM: James Salsman: Sarah?
10:28 AM: MediaWiki: Yes! That's me!
10:28 AM: James Salsman: Would it be easier to ask, "how many research
affiliates does the Foundation share server logs with IP addresses?"
10:29 AM: MediaWiki: Yes, I can ask that.
10:30 AM: James Salsman: Thank you.

At 10:34, the messages with the the URLs I posted were removed from
the chat log.

At 10:36, the chat stream was removed from the video, which was
replaced with the text, "Chat is disabled for this live stream," and
now is completely missing.

I am still interested in getting an answer to this question, but
disturbed by the removal of links to sources. Could I please have an
explanation?

Sincerely,
Jim

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] [Wikimedia Announcements] Welcoming Wikimedia Foundation’s new CTO, Grant Ingersoll

2019-09-19 Thread James Salsman

Well, I'm thrilled about this, especially after having had a look
through 
https://www.slideshare.net/lucidworks/searching-for-better-code-presented-by-grant-ingersoll-lucidworks

Honestly, though, it's only the third best thing that happened this
week after Valerie Plame entering politics and the UC system divesting
from fossil fuels.

Grant, welcome! My advice is to set make a long list of concrete KPIs
for contributor (e.g. editor) support, reach, and cloud support, in a
way that can be used for fundraising. The fundraising messaging has
been stuck for years on this thing about, "if everyone reading this
contributed the cost of a cup of coffee, then _some goal here_," which
is okay, but could be so much better flipped with the KPIs as the ask,
e.g., "Each $CURRENCY you donate will pay to support N additional
$CONTENTS," where the wikipedias can use ops measurements of the
resources typical to, e.g., take an article from Start to B class, for
example, or how much time, server electricity including idle time, and
other resource it takes to get a new word added to Wiktionary to some
level of proficiency. If these units relate to the potential donor's
language or geography, all the better. People geolocated in the
developed world using languages with highly developed wikipedias and
wiktionaries can be told how much it would cost to, for example,
eliminate units of the various WP:BACKLOG items you find suitable in
multivariate e.g. Latin squares donation message testing. (Or add new
technology projects like an intelligibility- and natural spoken
feedback version of https://www.speechace.co/api_sample/ hint
https://phabricator.wikimedia.org/T166929#5473028 hint.)

Also please take Curecoin instead of Bitcoin, even if that means
paying the extra transaction fee before converting the Curecoin to
cash. It is the height of folly to be as close to endorsing wasted
electricity-based cryptocurrency as we already do, when alternatives
with a benefit are less commonly known. The only other blockchain
thing I like is that long-term state-sponsored censorship mitigation
program can be based on copying the dumps to IPFS, but please also
support the CDN efforts like Encrypted-SNI:

https://twitter.com/jsalsman/status/1142172682751864832

https://twitter.com/jsalsman/status/1142940652851695616

https://twitter.com/jsalsman/status/1053786384463355905

Please let me know your thoughts.

Best regards,
Jim

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] browser microphone audio upload gadgets?

2019-07-06 Thread James Salsman

Is there a standard MediaWiki audio recorder, e.g. something from
https://developer.mozilla.org/en-US/docs/Web/API/MediaStream_Recording_API#See_also
which is working in some javascript gadget?

My 2017 Google Summer of Code student made one for Wiktionary, but I'm
not sure whether it is still working:
http://youtube.com/watch?v=8Euhu4Q7HF4=38m

It's based on less cross-platform technology than
https://www.npmjs.com/package/react-mic

What is the MediaWiki state of the art for cross-platform microphone
audio upload?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] C-team Statement on the Code of Conduct

2018-08-14 Thread James Salsman

Victoria,

Does the restriction on "Disclosure of a person's identity or other
private information without their consent" forbid the Foundation from
sharing the geolocation and IP addresses of editors with researchers
under NDA or law enforcement officials claiming to have, e.g., an
Interpol subpoena?

Sincerely,
Jim Salsman

On Tue, Aug 14, 2018 at 10:46 AM, Victoria Coleman
 wrote:
> Hello everyone,
>
> The executive leadership team, on behalf of the Foundation, would like to 
> issue a statement of unequivocal support for the Code of Conduct[1] and the 
> community-led Code of Conduct Committee. We believe that the development and 
> implementation of the Code are vital in ensuring the healthy functioning of 
> our technical communities and spaces. The Code of Conduct was created to 
> address obstacles and occasionally very problematic personal communications 
> that limit participation and cause real harm to community members and staff. 
> In engaging in this work we are setting the tone for the ways we collaborate 
> in tech. We are saying that treating others badly is not welcome in our 
> communities. And we are joining an important movement in the tech industry to 
> address these problems in a way that supports self-governance consistent with 
> our values.
>
> This initiative is critical in continuing the amazing work of our projects 
> and ensuring that they continue to flourish in delivering on the critical 
> vision of being the essential infrastructure of free knowledge now and 
> forever.
>
> Toby, Maggie, Eileen, Heather, Lisa, Katherine, Jaime, Joady, and Victoria
>
>
> https://www.mediawiki.org/wiki/Code_of_Conduct 
> 
>
>
>
>
> ___
> Wikimedia-l mailing list, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: wikimedi...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] ORES support checklist for communities

2018-04-09 Thread James Salsman

Amir,

Would you please comment on how these two techniques for creating new
articles affects the quality of ORES models?

(1) https://lists.wikimedia.org/pipermail/wiki-research-l/2018-March/006236.html

(2) https://lists.wikimedia.org/pipermail/wiki-research-l/2018-April/006238.html

Thank you.

Best regards,
Jim


On Mon, Apr 9, 2018 at 4:09 AM, Amir Sarabadani
 wrote:
> Hey,
> Scoring platform team aims to support more wikis but keeping track of how
> much support they need is not easy. This is why we built a tool that
> automatically gets updated and shows us an overview of the current support
> and specially it shows progress of labelling campaigns in different wikis
> so it's easier for us and the community to see which wiki is about to
> finish or which wiki is stalled.
>
> You can find the tool in https://tools.wmflabs.org/ores-support-checklist/
>
> The source code is in http://github.com/wiki-ai/ores-support-checklist.
> Pull requests are welcome
> To report problems or request new features, feel free to file a phabricator
> ticket tagged with ores-support-checklist (
> https://phabricator.wikimedia.org/tag/ores-support-checklist/)
>
> Best
> --
> Amir Sarabadani
> Software Engineer
>
> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Tel. (030) 219 158 26-0
> http://wikimedia.de
>
> Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
> Wissens frei teilhaben kann. Helfen Sie uns dabei!
> http://spenden.wikimedia.de/
>
> Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikimedia-l mailing list, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: wikimedi...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)

2018-02-21 Thread James Salsman

I am happy about https://www.youtube.com/watch?v=fpmRWCE7F_I=30m30s

Aaron Halfaker describes optimized ways to refine backlog presentation.

But I am unhappy that his first slides are missing and I hope he will
post the whole deck.

Also I would love to see the Turkish Wikipedia in InterPlanetary File
System, or at least its dumps.


On Sun, Feb 18, 2018 at 3:12 PM, Pine W  wrote:
> What's making me happy this week is Isarra's persistence in working on the
> Timeless skin. Timeless is based on Winter. [0] [1]
>
> For anyone who would like to try Timeless, it's available in Preferences
> under Appearance / Skin.
>
> What's making you happy this week?
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
> [0] https://www.mediawiki.org/wiki/Skin:Timeless
> [1] https://www.mediawiki.org/wiki/Winter
> ___
> Wikimedia-l mailing list, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: wikimedi...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] AMD petition

2017-03-13 Thread James Salsman

Recent leaks suggest almost all commercial x86 processors have been
compromised by closed-source back doors which enable eavesdropping and DRM
copy protection which in turn inhibits fair use.

On Mon, Mar 13, 2017 at 4:41 PM Lodewijk <lodew...@effeietsanders.org>
wrote:

> Hi Jim,
>
> Could you clarify the relationship with Wikimedia on this? I'm missing the
> link.
>
> Best,
> Lodewijk
>
> 2017-03-13 23:03 GMT+01:00 James Salsman <jsals...@gmail.com>:
>
> > Please join me in asking AMD to open-source the PSP (backdoor) in
> > their chips -- a chance to regain secure x86 hardware.
> >
> > https://www.change.org/p/advanced-micro-devices-amd-
> release-the-source-code-for-the-secure-processor-psp
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] AMD petition

2017-03-13 Thread James Salsman

Please join me in asking AMD to open-source the PSP (backdoor) in
their chips -- a chance to regain secure x86 hardware.

https://www.change.org/p/advanced-micro-devices-amd-release-the-source-code-for-the-secure-processor-psp

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] please critique my grant application

2017-03-12 Thread James Salsman

Please critique and endorse my grant application, especially after Doc
James replaces his name as the applicant so I can be the adviser and
my Google Summer of Code co-mentor and student can be co-grantees:
https://meta.wikimedia.org/wiki/Grants:Project/Intelligibility_Transcriptions

Thank you!

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] private key compromise OCSP declarations per RFC 5280

2017-02-02 Thread James Salsman

Bryan Davis wrote:
>
> The HTTPS tag (<https://phabricator.wikimedia.org/project/profile/162/>)
> and the Traffic component
> (<https://phabricator.wikimedia.org/project/profile/1201/>) would both
>seem reasonable.

Thanks Brian, will do. I'm working on a fail-over method which won't
allow the kind of MITM attacks which WhatsApp is vulnerable to under
default settings. In the mean time, the White House web site
apparently has a certificate which was working last week but now
indicates it was revoked last May:

https://crt.sh/?q=60a5d3648459f4eb88700db0d08cda7f6139359c

Would it be a good idea to have HTTP ready to go in case HTTPS becomes unstable?


On Mon, Jan 30, 2017 at 10:06 AM, James Salsman <jsals...@gmail.com> wrote:
> I have been informed off-list that the answer to my question is no,
> and asked to open a phabricator task to allow for fail-over alternate
> certificate utilization in the case of revocations via OCSP or
> revocation list-based revocation.
>
> I am strongly in favor of doing so, but I don't know how to categorize
> such a task or the group to assign it to. Any ideas?
>
>
> On Sun, Jan 29, 2017 at 8:32 PM, James Salsman <jsals...@gmail.com> wrote:
>> Are Foundation servers able to withstand Online Certificate Status
>> Protocol certificate revocations, such as might occur according to RFC
>> 5280 when a government agency declares a private key compromised
>> because of secret evidence?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] private key compromise OCSP declarations per RFC 5280

2017-01-30 Thread James Salsman

I have been informed off-list that the answer to my question is no,
and asked to open a phabricator task to allow for fail-over alternate
certificate utilization in the case of revocations via OCSP or
revocation list-based revocation.

I am strongly in favor of doing so, but I don't know how to categorize
such a task or the group to assign it to. Any ideas?

On Sun, Jan 29, 2017 at 8:32 PM, James Salsman <jsals...@gmail.com> wrote:
> Are Foundation servers able to withstand Online Certificate Status
> Protocol certificate revocations, such as might occur according to RFC
> 5280 when a government agency declares a private key compromised
> because of secret evidence?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] private key compromise OCSP declarations per RFC 5280

2017-01-29 Thread James Salsman

Are Foundation servers able to withstand Online Certificate Status
Protocol certificate revocations, such as might occur according to RFC
5280 when a government agency declares a private key compromised
because of secret evidence?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] New-feature demo: smart Recent Changes filtering with ORES

2016-12-02 Thread James Salsman

Who is speaking on https://www.youtube.com/watch?v=Jz_hJlkEqkc ?

What is the URL to test it?

I'm glad someone is using
https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service#Edit_quality_models

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] measuring time to proofread wikipedias and the Making Work Pay Tax Credit

2016-09-17 Thread James Salsman

I am pleased to announce that, thanks to Google Summer of Code student
Priyanka Mandikal, the project for the Accuracy Review of Wikipedias
project has delivered a working demonstration of open source code and
data available here:

https://github.com/priyankamandikal/arowf/

Please try it out at:

http://tools.wmflabs.org/arowf/

We need your help to test it and try it out and send us comments. You
can read more about the project here:

https://priyankamandikal.github.io/posts/gsoc-2016-project-overview/

The formal project report, still in progress (Google docs comments
from anyone are most welcome) is at:

https://docs.google.com/document/d/1_AiOyVn9Qf5ne1qCHIygUU3OTJcbpkb14N3rItyjaVQ/edit

This allows experiments to measure, for example, how long it would
take to complete proofreading of the wikipedias with and without
paying editors to work alongside volunteers. I am sure everyone agrees
that is an interesting question which bears directly on budget
expectations. I hope multiple organizations use the published methods
and their Python implementations to make such measurements. I would
also like to suggest a proposal related to the questions in both of
the following reviews:

http://unotes.hartford.edu/announcements/images/2014_03_04_Cerasoli_and_Nicklin_publish_in_Psychological_Bulletin_.pdf

http://onlinelibrary.wiley.com/doi/10./1748-8583.12080/abstract

The most recent solicitation of community input for the Foundation's
Public Policy team I've seen said that they would like suggestions for
specific issues as long as the suggestions did not involve
endorsements of or opposition to any specific candidates. My support
for adjusting copyright royalties on a sliding scale to transfer
wealth from larger to smaller artists has been made clear, and I do
not believe there are any concerns that I have not addressed
concerning alignment to mission or effectiveness. I would also like to
propose a related endorsement.

The Making Work Pay tax credit (MWPTC) is a negative payroll tax that
expired in 2010. It has all the advantages of an expanded Earned
Income Tax Credit (EITC) but would happen with every paycheck.
Reinstating the Making Work Pay tax credit would serve to reduce
economic inequality.

This proposal is within the scope of the Foundation's mission because
reducing economic inequality should serve to empower people to develop
educational content for the projects because of the increased levels
of support for artistic production among a broader set of potential
editors with additional discretionary free time due to increased
wealth. This proposal is needed because economic inequality produces
more excess avoidable deaths and leads to fewer years of productive
life than global warming. This proposal would provide substantial
benefits to the movement, the community, the Foundation, the US and
the world if it were to be successfully adopted. For the reasons
stated above, this proposal will be seen as positive.

Here is some background and supporting information:

* MWPTC overview: https://en.wikipedia.org/wiki/Making_Work_Pay_tax_credit

* MWPTC details: http://tpcprod.urban.org/taxtopics/2011_work.cfm

* Problems with expanding the EITC:
http://www.taxpolicycenter.org/taxvox/eitc-expansion-backed-obama-and-ryan-could-penalize-marriage-many-low-income-workers

* Educational advantages of expanding the EITC:
https://www.brookings.edu/opinions/this-policy-would-help-poor-kids-more-than-universal-pre-k-does/

* Financial advantages of expanding the EITC:
http://www.cbpp.org/research/federal-tax/strengthening-the-eitc-for-childless-workers-would-promote-work-and-reduce

* The working class has lost half their wealth over the past two
decades: https://www.nerdwallet.com/blog/finance/why-people-are-angry/

* Health effects of addressing economic inequality:
http://talknicer.com/ehlr.pdf

* Economic growth effects of addressing economic inequality:
http://talknicer.com/egma.pdf

* Unemployment and underemployment effects of addressing economic
inequality: http://diposit.ub.edu/dspace/bitstream/2445/33140/1/617293.pdf

For an example of how a campaign on this issue could be conducted
based on the issues identified in the sources above, please see:
http://bit.ly/mwptc

Please share your thoughts on the wikipedias proofreading time
measurement effort and this related public policy proposal.

I expect that some people will say that they do not understand how the
public policy proposal relates to the project to measure the amount of
time it would take to proofread the wikipedias. I am happy to explain
that in detail if and when needed. On a related note, I would like to
point out that the project report Google doc suggests future work
involving a peer learning system for speaking skills using the same
architecture as we derived from the constraints for successfully
performing simultaneous paid and volunteer proofreading. I would like
people to keep that in mind when evaluating the utility of these
proposals.

Re: [Wikitech-l] Reply: [GSoC 2016] Query about the ideas' project time

2016-03-21 Thread James Salsman

>
>
>
 Hi Linxuan,

Thank you for your question:

 >... What does the "reputation score" in the description refer to?

I've asked Priyanka to reply with her current design, but here is some
of the advice I gave her:

"Each reviewer needs, at a minimum, data indicating the number and
proportion of reviewers who have agreed with them. However, the third level
of tie-breaking review introduces an extra bit for each disagreement which
determines whether agreement or disagreement should be counted in their
favor. So, even if a given reviewer only agrees with 50% of the other
reviewers, the determination of the tie breaker in each case of
disagreement controls whether their reputation score ranges from 0% to
100%. (As too does the agreement proportion, which is unlikely to be
exactly 50%.)

"Do you want the reviewers to know their agreement ratios and reputation
scores? How might their behavior change if they are and aren't told those?
Could there ever be a case when you might want to withhold them? Would
there ever be a benefit from distorting them? How about displaying them as
a range instead of distorting or withholding them? That last possibility
seems superior to me. You might want to do that when you are unsure that
the precision of the mathematical values is near the accuracy of the
knowledge they represent. Do you want to be able to tell each reviewer the
responses which have contributed to defects in their reputation scores,
i.e., do you want them to know which disagreements were tie-broken against
their favor?"

Her reply at the time was:

"In case of two reviewers agreeing, we add a +1 to the reputation. In case
of disagreement, we seek the opinion of a 3rd reviewer. If A says Yes, B
says No and C says Yes to an edit, A and C will have an agreement ratio of
50% and reputation of 100%, whereas B  will have an agreement ratio of 0
and reputation 0%? This would of course change as more edits are reviewed
by them."

I believe that is still an accurate description of the current design.

Finally, I regret that the GSoC program doesn't allow more than one student
per

Best regards,
Jim Salsman
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: [GSoC Mentors] GSoC Org Apps close soon

2016-02-16 Thread James Salsman

If anyone has objections to Fabian, Maribel, and me continuing to mentor
http://mediawiki.org/wiki/Accuracy_review
under the GSoC program, please state your objections now.

Quim, thank you for your kind words on the IEG application.

-- Forwarded message --
From: *'sttaylor' via Google Summer of Code Mentors List* <
google-summer-of-code-mentors-l...@googlegroups.com>
Date: Tuesday, February 16, 2016
Subject: [GSoC Mentors] GSoC Org Apps close soon
To: Google Summer of Code Mentors List <
google-summer-of-code-mentors-l...@googlegroups.com>

Just a quick reminder that the deadline to apply to be a GSoC 2016 mentor
organization is this Friday, February 19th at 19:00 UTC.

Visit our new website  to apply as an organization
today. For helpful tips on what is expected as a mentor organization and as
a mentor or org admin for GSoC 2016 read the Mentor Manual
.

We will not accept late applications under any circumstances.

Good luck to all org applicants!

Best,

Stephanie

-- 
You received this message because you are subscribed to the Google Groups
"Google Summer of Code Mentors List" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to google-summer-of-code-mentors-list+unsubscr...@googlegroups.com

.
To post to this group, send email to
google-summer-of-code-mentors-l...@googlegroups.com

.
Visit this group at
https://groups.google.com/group/google-summer-of-code-mentors-list.
For more options, visit https://groups.google.com/d/optout.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Could someone relatively new to Python please QA the Accuracy review login framework?

2016-01-18 Thread James Salsman

>>> Have you looked at using OAuth for authentication?
>>
>> Yes; the modules in use support OAuth but we made a conscious decision to
>> support anonymity. Lack of anonymity can interfere with the operation of the
>> reviewer reputation database.
>
> I'd love to read the background discussion that led to that decision.

Here is the pertinent excerpt:

"I would prefer to have text presented to reviewers anonymously. While
we can and do make reputation decisions about particular users,
wikipedia editing is generally pseudonymous with little control over
identity and password security. There are already tools for addressing
user-oriented issues. All of the accuracy review contemplated in the
original assignment assumes that review is anonymous so that reviewers
can not be influenced by, e.g., commercial loyalties or bribery."

> Could you identify which part of MediaWiki's OAuth implementation has
> unacceptable problems regarding anonymity?

Let me think about that and respond later, please. Upgrading to do
that might be more configuration than re-coding.

> If you are setting high standards/promises in that regard, your
> alternative implementation of user authentication will need to be
> extremely carefully written (as will your entire codebase need very
> good security auditing).

Hence my request for people to have a look at it. The Python Flask
default login system is being used.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Could someone relatively new to Python please QA the Accuracy review login framework?

2016-01-18 Thread James Salsman

> Have you looked at using OAuth for authentication?

Yes; the modules in use support OAuth but we made a conscious decision to
support anonymity. Lack of anonymity can interfere with the operation of
the reviewer reputation database.

On Tuesday, January 12, 2016, James Salsman <jsals...@gmail.com> wrote:

> An Outreachy candidate for http://mediawiki.org/wiki/Accuracy_review who
> went ahead and started unpaid has been making good progress, and is about
> to land the central guts of the project on github. It's a new way to
> transition from creating to maintaining Wikipedia articles, with an
> emphasis on detecting outdated statistics, fighting bias including paid
> advocacy of all kinds, and proofreading WEP student work. It's been going
> slow, mostly because the original trial run architecture was too dependent
> on email.
>
> However, before she gets there, could one or two people who are maybe
> beginner or intermediate with Python but advanced with Mediawiki or PHP
> please test her user authentication and login framework?
>
> https://github.com/priyankamandikal/wikireview/
> <https://github.com/priyankamandikal/wikireview/issues>
>
> It's built for PythonAnywhere because it shouldn't run on Wikimedia
> servers, because of the safe harbor DMCA provisions precluding editorial
> control by web hosts. Please report any issues on github and note your
> results on the Phabricator task to prevent duplication of effort.
>
> Thanks in advance!
>
> Best regards,
> Jim Salsman
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Accuracy review on tool labs?

2016-01-13 Thread James Salsman

John,

Are you sure it makes sense to run accuracy review on wikimedia servers?

Best regards,
Jim
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Could someone relatively new to Python please QA the Accuracy review login framework?

2016-01-12 Thread James Salsman

An Outreachy candidate for http://mediawiki.org/wiki/Accuracy_review who
went ahead and started unpaid has been making good progress, and is about
to land the central guts of the project on github. It's a new way to
transition from creating to maintaining Wikipedia articles, with an
emphasis on detecting outdated statistics, fighting bias including paid
advocacy of all kinds, and proofreading WEP student work. It's been going
slow, mostly because the original trial run architecture was too dependent
on email.

However, before she gets there, could one or two people who are maybe
beginner or intermediate with Python but advanced with Mediawiki or PHP
please test her user authentication and login framework?

https://github.com/priyankamandikal/wikireview/


It's built for PythonAnywhere because it shouldn't run on Wikimedia
servers, because of the safe harbor DMCA provisions precluding editorial
control by web hosts. Please report any issues on github and note your
results on the Phabricator task to prevent duplication of effort.

Thanks in advance!

Best regards,
Jim Salsman
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: Database administration support (was Re: IRC office hours: Shared hosting)

2015-12-29 Thread James Salsman

Sorry I forgot to copy this list.

-- Forwarded message --
From: *James Salsman* <jsals...@gmail.com>
Date: Tuesday, December 22, 2015
Subject: Database administration support (was Re: IRC office hours: Shared
hosting)
To: Wikimedia Mailing List <wikimedi...@lists.wikimedia.org>

On Sunday, December 20, 2015, Brian Wolff <bawo...@gmail.com
<javascript:_e(%7B%7D,'cvml','bawo...@gmail.com');>> wrote:

> If you want to get Dispenser his hard disk space, you should take it
> up with the labs people, or at the very least some thread where it
> would be on-topic.
>

The labs people are so understaffed that two extremely important anti-spam
bots recently had to be taken offline for much longer than in recent years.

I propose Foundation management allocate the necessary resources and
recommend the hiring of sufficient personnel and purchasing of sufficient,
non NSA-compatible (i.e., discount and homebrew style) equipment
to properly support both existing infrastructural bots and similar projects
such as Dispenser's reflinks cache.

I would also like to propose that the Foundation oppose the TPP provisions
deleterious to our interests, and that this position be endorsed on the
Public Policy list.

> Then by definition it wouldn't be a third-party spam framework if WMF
> was running it.

I am not proposing that the WMF take the bots over, just meet their
necessary service level requirements.

Sincerely,
Jim
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] IRC office hours: Shared hosting

2015-12-20 Thread James Salsman

Were there any objections to my request below?

Can we also please hire additional database, system, and if necessary
network administration support to make sure that the third party spam
prevention bot infrastructure is supported more robustly in the future?

On Monday, December 14, 2015, James Salsman <jsals...@gmail.com> wrote:

> Hi Giles,
>
> I regret I will probably not be available for the IRC office hours as
> scheduled.
>
> In the discussion of shared hosting, I worry that en:User:Dispenser's
> reflinks project, which requires a 20 TB cache, is being forgotten
> again. He tried to host it himself, but it's offline again. This data
> is essential in maintaining an audit trail of references as long as
> the Internet Archive respects robots.txt retroactively, allowing those
> who inherit domains to censor them, even if they have already been
> used as a reference in Wikipedia. Keeping the cache is absolutely a
> fair use right in the US, in both statutory and case law, and it is
> essential to be able to track down patterns of attempts at deceptive
> editing to address quality concerns around deliberately biased editing
> such as paid editing. Because of the sensitivity of this goal, the
> Foundation should certainly bear the risk of hosting the reflinks
> cache. However, in the past, 20 TB was considered excessive, even
> though the cost was shown to be less than $5000 without whatever Dell
> NSA-enabled hardware you usually buy.
>
> Would you please reach out to en:User:Dispenser and offer them the
> 20TB hosting solution they need for the Foundation to bear the risk of
> the reflinks cache?  Thank you for your kind consideration.
>
> Best regards,
> Jim
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] IRC office hours: Shared hosting

2015-12-14 Thread James Salsman

Hi Giles,

I regret I will probably not be available for the IRC office hours as scheduled.

In the discussion of shared hosting, I worry that en:User:Dispenser's
reflinks project, which requires a 20 TB cache, is being forgotten
again. He tried to host it himself, but it's offline again. This data
is essential in maintaining an audit trail of references as long as
the Internet Archive respects robots.txt retroactively, allowing those
who inherit domains to censor them, even if they have already been
used as a reference in Wikipedia. Keeping the cache is absolutely a
fair use right in the US, in both statutory and case law, and it is
essential to be able to track down patterns of attempts at deceptive
editing to address quality concerns around deliberately biased editing
such as paid editing. Because of the sensitivity of this goal, the
Foundation should certainly bear the risk of hosting the reflinks
cache. However, in the past, 20 TB was considered excessive, even
though the cost was shown to be less than $5000 without whatever Dell
NSA-enabled hardware you usually buy.

Would you please reach out to en:User:Dispenser and offer them the
20TB hosting solution they need for the Foundation to bear the risk of
the reflinks cache?  Thank you for your kind consideration.

Best regards,
Jim

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-13 Thread James Salsman

Risker wrote:

... relying on suggestions from a six-year-old strategy document
 when we're about to start a new strategic session, isn't the best
 course of action.

A strategy proposal which never garnered criticism after so many
opportunities would seem to qualify as at least an emergent strategy
within the meaning of the slide and narrative at
https://www.youtube.com/watch?v=N4Kvj5vCaW0t=19m30s

Furthermore, the initial limited subtask would be much more difficult
to evaluate as a strategy without a working prototype, including by
the Bot Approvals Group which demands working code before making a
final decision on implementation. Trying to second guess the BAG is
presumptuous.

Is it possible that supporting updates to out of date articles would
not be part of any successful strategy for the Foundation? I have
posted multiple series of statistics to wiki-research-l in the past
several months proving that quality issues are transitioning from
creating new content to maintaining old content, and will be happy to
recapitulate them should anyone suggest that they think it could be.

 what exactly is the plan for doing something with this information.

It will be made available to volunteers as a backlog list which
community members may or may not choose to work on. The Foundation
can't prescribe mandatory content improvement work without putting the
safe harbor provisions in jeopardy. Volunteers will be attracted to
working on such updates in proportion to the extent they see them as
being a worthy use of their editing time.

I have additional detailed plans for testing which I will be happy to
discuss with interested co-mentors, because depending on available
resources there could be a way to eliminate substantial duplication of
effort.

I have updated the synopses at https://www.mediawiki.org/wiki/Accuracy_review
and https://phabricator.wikimedia.org/T89416

Best regards,
James Salsman

 I invite review of this preliminary proposal for a Google Summer of
 Code project:
  http://www.mediawiki.org/wiki/Accuracy_review

 If you would like to co-mentor this project, please sign up. I've been
 a GSoC mentor every year since 2010, and successfully mentored two
 students in 2012 resulting in work which has become academically
 relevant, including in languages which I can not read, i.e.,
 http://talknicer.com/turkish-tablet.pdf .) I am most interested in
 co-mentors at the WMF or Wiki Education Foundation involved with
 engineering, design, or education.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-13 Thread James Salsman

Risker wrote:

... it received a single support vote

There are two supporters including myself who indicated they are
willing to work on it, and it also recieved support at
https://strategy.wikimedia.org/wiki/Favorites/Lodewijk

Many of the implemented proposals received less formal process
support, for example:
https://strategy.wikimedia.org/wiki/Proposal:Foundation-Announce-l
https://strategy.wikimedia.org/wiki/Proposal:Create_Wikisource_for_Yiddish
https://strategy.wikimedia.org/wiki/Proposal:Allow_IPs_to_edit_sections_on_English_Wikipedia_(done)
https://strategy.wikimedia.org/wiki/Proposal_talk:Implement_secret_ballots_(Done)
https://strategy.wikimedia.org/wiki/Proposal:IPhone/iPod_Touch_Offical_Wikipedia_App_(Done)
https://strategy.wikimedia.org/wiki/Proposal:Mobiltelefonversion_von_Wikipedia_(Done)
... and at least four more just that I have looked through so far.

Moreover, according to the vote scoring system, I believe it ranked in
the top 8% out of several hundred proposals, although that information
is apparently no longer available.

... there's no basis to believe that this ... will actually identify
inaccuracies in the text

Do you believe that if you find an article about a geographic region
with the words population 1,234,567 or gross national product
within the same grammatical clause as a number, and you know that text
was inserted 10 years ago, that you have not found a likely
out-of-date inaccuracy? What reason could there possibly be to believe
otherwise?

... It would require tens of thousands of person-hours (if not more) to
analyse the data, and not a single article would be improved.

On the contrary, we can try it on 100 randomly selected vital
articles, and if we don't have enough data to make an extrapolation
with useful confidence intervals, we can try it on a slightly larger
sample of them. This is something the GSoC students can do themselves,
without and volunteer support. But what reason is there to believe
that such support won't be forthcoming if requested from the
copyeditor's guild or similar wikiproject, for example?

... Your proposal requires massive time commitment from reviewers

Why would it require any more time commitment than the existing 17,200
articles in [[Category:Wikipedia articles needing factual
verification]]? Where is the requirement? Volunteer editors are free
to spend their time in the manner which they believe will best serve
improvements.

... it doesn't even fix out-of-date information.

Do you think actual fact checking should be done by people or bots?

... There is no indication at all that there is any interest on the part
of Wikipedians to review data identified in the manner you propose.

Most of the WP:BACKLOG categories have articles entering and exiting
them every day. What reason is to believe that articles selected by an
automated accuracy review process would be any different?

... there's no basis to believe that this project would have any
effect on accuracy

Even if you had airtight evidence that was incontrovertibly true (and
for the reasons above, there can obviously be no such evidence)
wouldn't it still be the case that there would only be one way to find
out?

Best regards,
James Salsman

On Fri, Feb 13, 2015 at 10:58 AM, James Salsman jsals...@gmail.com wrote:
Risker wrote:

... relying on suggestions from a six-year-old strategy document
when we're about to start a new strategic session, isn't the best
course of action.

A strategy proposal which never garnered criticism after so many
opportunities would seem to qualify as at least an emergent strategy
within the meaning of the slide and narrative at
https://www.youtube.com/watch?v=N4Kvj5vCaW0t=19m30s

Furthermore, the initial limited subtask would be much more difficult
to evaluate as a strategy without a working prototype, including by
the Bot Approvals Group which demands working code before making a
final decision on implementation. Trying to second guess the BAG is
presumptuous.

Is it possible that supporting updates to out of date articles would
not be part of any successful strategy for the Foundation? I have
posted multiple series of statistics to wiki-research-l in the past
several months proving that quality issues are transitioning from
creating new content to maintaining old content, and will be happy to
recapitulate them should anyone suggest that they think it could be.

what exactly is the plan for doing something with this information.

It will be made available to volunteers as a backlog list which
community members may or may not choose to work on. The Foundation
can't prescribe mandatory content improvement work without putting the
safe harbor provisions in jeopardy. Volunteers will be attracted to
working on such updates in proportion to the extent they see them as
being a worthy use of their editing time.

I have additional detailed plans for testing which I will be happy to
discuss with interested co-mentors, because

[Wikitech-l] need review and co-mentor volunteers for GSoC Accuracy review proposal

2015-02-12 Thread James Salsman

I invite review of this preliminary proposal for a Google Summer of
Code project:
 http://www.mediawiki.org/wiki/Accuracy_review

If you would like to co-mentor this project, please sign up. I've been
a GSoC mentor every year since 2010, and successfully mentored two
students in 2012 resulting in work which has become academically
relevant, including in languages which I can not read, i.e.,
http://talknicer.com/turkish-tablet.pdf .) I am most interested in
co-mentors at the WMF or Wiki Education Foundation involved with
engineering, design, or education.

Synopsis:

Create a Pywikibot to find articles in given categories, category
trees, and lists. For each such article, add in-line templates to
indicate the location of passages with (1) facts and statistics which
are likely to have become out of date and have not been updated in a
given number of years, and (2) phrases which are likely unclear. Use a
customizable set of keywords and the DELPH-IN LOGIN parser
[http://erg.delph-in.net/logon] to find such passages for review.
Prepare a table of each word in article dumps indicating its age.
Convert flagged passages to GIFT questions
[http://microformats.org/wiki/gift] for review and present them to one
or more subscribed reviewers. Update the source template with the
reviewer(s)' answers to the GIFT question, but keep the original text
as part of the template. When reviewers disagree, update the template
to reflect that fact, and present the question to a third reviewer to
break the tie.

Possible stretch goals for Global Learning Xprize Meta-Team systems
[http://www.wiki.xprize.org/Meta-team#Goals] integration TBD.

Best regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dnschain

2014-04-30 Thread James Salsman

Daniel Friesen wrote:

... The okTurtles/DNSChain authors...
 make ridiculous statements like It depends on group
 consensus, but the group might not be very bright. What
 happens then?

While I agree with much of Daniel's analysis, that part was actually
the most compelling of all the arguments against convergence.io,
except for the part about okTurtles/dnschain accepting multiple
passwords which decrypt the same cyphertext to different data sets,
because http://xkcd.com/538/

And that part is more than compelling enough for me to remain
convinced that okTurtles/dnschain is superior to Convergence.

I enjoyed the https://www.youtube.com/watch?v=Z7Wl2FW2TcA
video because I used to sit 10 meters from Kipp Hickman at Netscape
when he was adding certificate authorities to SSL. I remember him
joking about it in the hallway, right next to the letter from the NSA
which said Netscape would be in trouble if they didn't comply with
various demands which someone had pinned up across from Dan Mosedale's
cube. Five years later I was reviewing CALEA compliance documents at
Cisco. I wonder what Mosedale wants to do for DNS these days.

Best regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dnschain

2014-04-30 Thread James Salsman

 it just proxies whatever normal public dns you tell it to

Presumably they seed the namecoin table with DNS records and use those
instead when they exist? I don't know whether those can be expired
efficiently.

 As for on the current web making sure you're sending
 your password to the right person, no one is intercepting
 your credit card details, who you're talking to isn't being
 tracked by anyone but the site itself, etc... well okTurtles
 just leaves that up to the same certificate authorities
 they don't trust

It seems like they would take the next logical step and verify
namecoin-cached public key ﬁngerprints of both the site and the
certificate before initiating a traditional SSL connection (and/or
better revocation support.)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] dnschain

2014-04-29 Thread James Salsman

Would someone please review this DNS proposal for secure HTTPS?

https://github.com/okTurtles/dnschain
http://okturtles.com/other/dnschain_okturtles_overview.pdf
http://okturtles.com/

It is new but it appears to be the most correct secure DNS solution for
HTTPS security at present. Thank you.

Best regards,
James Salsman
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Conflict resolution wikis

2014-02-15 Thread James Salsman

Please see WikiWinWin: A Wiki Based System Together with Win Win Method
for Collaborative Requirements Negotiation by Ledan Huang, Xiaobo Wu and
Yangu Zhang (2013)
http://www.atlantis-press.com/php/download_paper.php?id=10885

Does someone want to do a Mediawiki extension for Google Summer of Code or
the like to implement a riff on that?

It's applicable to more than just software development, and the software
development it talks about includes collaborative documentation. I suspect
that it is very similar to general accuracy maintenance automation, and it
still works better with a human participant driving the process schedule
(to the extent that human is skilled at it) but there are some very
attractive opportunities for e.g. Wikidata and maintenance bot integration
down the road if it works out.

Best regards,
James Salsman
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ARM servers

2014-01-13 Thread James Salsman

Eugene wrote:

... OS support if not mature yet, especially for ARMv8 (64 bit).

Does someone have an exhaustive list of packages which we depend on
for production but aren't available as arm binaries yet? We could try
to build those.

As for development, I understand that Oracle's JDK isn't on arm yet,
but am not sure why openjdk wouldn't be strongly preferred. Maybe the
multimedia team wants a 64 bit JRE for Adobe Flash C++ CrossBridge
(formerly Alchemy) as per http://adobe-flash.github.io/crossbridge/
but I assume they probably have Macs, or can get them if they need to
compile Flash applets for non-webrtc compliant client browser support.

Can someone more familiar with the Foundation's server infrastructure
needs than I please create a page somewhere with a checklist of
packages, modules, tools, etc., which need to be on arm but aren't
yet? Jasper mentioned that we need virtualization for Labs but aren't
using Zen. It would be great to see what the developers for the
virtualization that Labs uses say about prospects for arm builds.

Best regards,
James

On Sun, Jan 12, 2014 at 5:05 PM, James Salsman jsals...@gmail.com wrote:
 Nicolas Charbonnier's Latest ARM Server solutions booths tour may be
 of some interest for those of you interested in low power server
 hardware:

 http://armdevices.net/2013/12/30/latest-arm-server-solutions-booths-tour/

 Mitac isn't represented there, but he did an interview of them a year
 and a half ago:

 http://armdevices.net/2012/06/07/mitac-gfx-arm-server/

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] ARM servers

2014-01-12 Thread James Salsman

Nicolas Charbonnier's Latest ARM Server solutions booths tour may be
of some interest for those of you interested in low power server
hardware:

http://armdevices.net/2013/12/30/latest-arm-server-solutions-booths-tour/

Mitac isn't represented there, but he did an interview of them a year
and a half ago:

http://armdevices.net/2012/06/07/mitac-gfx-arm-server/

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Principal component analysis for multivariate testing

2013-11-01 Thread James Salsman

I have long wondered why the WMF Fundraising department seems stuck on
A/B testing instead of multivariate analysis.[1] Does anyone know?

In any case, modern multivariate analysis depends on principal
component analysis[2] (PCA) and it really does work great. I have been
told for years that it is planned and can't wait to see what happens
to fundraising when implemented.

So the reason why I am writing is because I just noticed that
EIGENSOFT[3] version 5 has a new option for PCA projection with large
amounts of missing data in pca.c[4] which is really clean code, too.
I hope this helps.

Best regards,
James Salsman

[1] http://en.wikipedia.org/wiki/Multivariate_analysis
[2] http://en.wikipedia.org/wiki/Principal_component_analysis
[3] http://www.hsph.harvard.edu/alkes-price/software/
[4] http://www.hsph.harvard.edu/alkes-price/files/2013/08/EIG5.0.1.tar.gz

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] Updates on VE data analysis

2013-07-26 Thread James Salsman

Dario,

Do you intend to measure the total number of edits per day prior to
and after the visual editor roll-out?

It appears that you have not analyzed or presented any data associated
with those statistics.

For example, why are you not providing a daily version of the hourly
graph at http://ee-dashboard.wmflabs.org/graphs/enwiki_ve_hourly_by_ui
?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] Updates on VE data analysis

2013-07-26 Thread James Salsman

On Fri, Jul 26, 2013 at 7:28 PM, Dario Taraborelli 
dtarabore...@wikimedia.org wrote:
...
 We do have a graph of total hourly edits on enwiki across mainspaces here:
 http://ee-dashboard.wmflabs.org/graphs/enwiki_edits_api - it's trivial to
bin
 by day and filter to the main namespace only, I'll add this to my todo
list.

Thank you! Here is a daily graph of edits by source and visual editor with
totals:

http://i.imgur.com/2f0tmEu.png

It would be great to know what the average total edits per day was in June.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parallel computing project

2010-10-27 Thread James Salsman

Aryeh Gregor writes:

 To clarify, the subject needs to 1) be reasonably doable in a short
 timeframe, 2) not build on top of something that's already too
 optimized

Integrating a subset of RTMP (e.g. the
http://code.google.com/p/rtmplite subset) into the chunk-based file
upload API -- http://www.mediawiki.org/wiki/API:Upload#Chunked_upload
-- would be an example of parallel I/O that we really need if we ever
hope to have reasonable microphone uploads for Wiktionary
pronunciation collection.  I know Flash sucks, but it sucks way less
for microphone upload than currently nonexistent HTML5 audio upload
support, client side Java, or any other alternative, and probably will
suck way less than any of those alternatives for years.  Soon GNU
Gnash should have microphone Speex upload on all three major
platforms, assuming the Gnash programming team doesn't starve to death
first.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] fundraising earmarks for code review, image bundle dumps, search failover auction, Wikinews independence, bugzilla queue maintenance etc.

2010-09-06 Thread James Salsman

 Tim Starling wrote:

 I meant the code: CentralNotice, DonationInterface, GeoLite,
 ContactPageFundraiser, the Drupal extension, etc.

What remains to be done on those projects?  The only unassigned bug of
any immediately apparent consequence on any of those keywords I was
able to find is bug 24682 which looks like it might have a patch
already described in it.

 I didn't think you were a committer.

My contributions for audio recording uploads are not ready because
they depend on the upload redesign and client-side Flash.  I am still
waiting to hear from anyone why the current state of Flash is any less
closed than that of Java.  I am willing to give the benefit of the
doubt that people have simply not researched the situation with
Adobe's current public documentation and license along with the state
of Haxe and Gnash, but in the mean time I can wait for the upload
redesign before I take up that issue in earnest.  I'm also trying to
raise money for Gnash developers to make that particular hurdle a
complete non-issue.

Perhaps there are Mediawiki users other than the Foundation who would
not be opposed to the use of Flash for microphone audio upload?

Best regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] list of proposed fundraising stimuli (was Re: fundraising...)

2010-09-04 Thread James Salsman

Ryan Kaldari wrote:

 ... [we're] in the process of hooking up Open Web Analytics
 http://www.openwebanalytics.com

It's great to see that donor logs are going in to a database instead
of just a text file, but multiple regression in SQL is absurdly
difficult because of the limitations of SQL, so I still recommend R,
in particular: http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf
and http://wiener.math.csi.cuny.edu/Statistics/R/simpleR/stat006.html
I will ask Arthur Richards for data coding formats.

I predict that multiple response checkboxes will do better than the
more constraining radio buttons, but there is no reason that they
should not be measured as any other independent variable. It is
probably a lot more important to measure the number of earmarks
offered: 0-26.  There is plenty of reason to believe that showing 26
options will have a slight advantage over 25, but I can't see the test
results from the Red Cross (they measure the things which increase
donations of blood much more carefully than money, at least in their
publications that I've been able to find.) Don't forget the control
case where no donor selections are offered. Optimization requires
measurement, and it is easy to measure offering a lot of options up
front.

Do you think that variations on the disclaimer should also be tried?
I think there is reason to believe something terse might result in
more donations, e.g.: These options are advisory only. and/or The
Wikimedia Foundation reserves the right to override donor selections,
cancel any project, and use any funds for any purpose. and/or All
donations are discretionary, these options are offered for polling
purposes only. or some combination.  What does Mike Godwin think a
good set of disclaimers to test might be?

I conflated the proposed stimulus list down to 25 non-default items
and enumerated them with letters of the alphabet so that everyone
would understand that it is feasible to test additional proposals as
well.  I have not yet surveyed the Village Pumps or mailing lists for
additional stimulatory ideas but I hope people who have or who see
anything missing will suggest at least five more. Translations would
be great, too.

(default) Use my donation where the need is greatest.
A. Auction the order of search failover links to search engine companies.
B. Broaden demographics of active editors.
C. Compensate people who submit improvements to the extent that they
are necessary and sufficient.
D. Display most popular related articles.
E. Enhance automation of project tasks.
F. Enhance site performance in underserved geographic regions.
G. Enhance visualizations of projects and their editing activity.
H. Establish journalism awards, expense accounts and compensation for
independent Wikinews reporters, fact checkers, photographers and
proofreaders.
I. Establish secure off-site backup copies.
J. Establish simple Wikipedias for beginning readers in languages
other than English.
K. Improve math formula rendering.
L. Increase the number of active editors.
M. Increase the number of articles, images, and files.
N. Increase the number of unique readers.
O. Make it easier for people to add recorded audio pronunciations.
P. Obtain expert article assessments.
Q. Obtain reader quality assessments.
R. Perform external code reviews.
S. Perform independent usability testing.
T. Produce regular snapshots and archives.
U. Retain more active editors.
V. Strengthen Wikimedia Foundation financial stability.
W. Support a thriving research community.
X. Support an easier format to write quiz questions.
Y. Support more reliable server uptime.
Z. Support offline editing.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] fundraising earmarks for code review, image bundle dumps, search failover auction, Wikinews independence, bugzilla queue maintenance etc.

2010-09-03 Thread James Salsman

Tim Starling wrote:

 As for fundraising, the work is uninspiring, and I don't think we've
 ever managed to get volunteers interested in it regardless of how open
 we've been.

I must take exception to that because I did a lot of work last year on
several aspects of fundraising, including button design, some of which
(e.g. the proposed button with Jimbo's face on it) wasn't even A/B
tested even after the A/B test harness had been developed. I was never
told why there was no A/B test of that button.  It seems like I had to
ask over and over before anyone even did any A/B tests in the first
place.  Frankly, my efforts to help with fundraising are more
inspiring than a lot of the other things I try to do to help, but
inspiration is generally orthogonal to frustration. However, I know
one of my responsibilities as a volunteer to keep asking until things
get done.  Furthermore, how do you expect effective help with
fundraising when the fundraising mailing list and archives are closed?

Danese Cooper wrote:

 1. Eliminate single points of failure / bottlenecks

I am glad that is the top priority, because there are clearly failures
and bottlenecks in external code review, production of image bundle
dumps, auctioning search failover links to wealthy search engine
donors, steps to make Wikinews an independent, funded, and respected
bona fide news organization, general bugzilla queue software
maintenance, etc.

About eight months ago I was told that fundraising this year will
allow donors to pick an optional earmark for their funds.  Is that
still the plan?

Donors should be allowed to optionally mark their donations for
projects including (1) the review of externally submitted code, (2)
the production of image bundles along with the dumps, (3) auctioning
the order of appearance of several search failover gadget links to
external search engines (such as users were able to use before they
were rendered unusable by the usability project) to wealthy search
engine donors, (4) a way to pay people who work on the bugzilla queue
(e.g. through http://odesk.com or the like) without having to set up
lengthy contracts, and (5) a way to pay for Wikinews journalism
awards, travel expenses, reporters, fact checkers, photographers,
camera and recording equipment, and proofreaders, etc.

Are there any reasons not to allow donors to earmark categories?  I am
not saying that those are the only earmarks which should be offered,
but I am certain that at least those five should be included.

What are other problems which might be solved by donor earmarks?
There are ten rejected GSoC projects which I feel strongly about
because they were scored positively by the mentors but rejected
because of the number of slots requested. Could those be funded by
donor earmarks?

Regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] fundraising earmarks for code review, image bundle dumps, search failover auction, Wikinews independence, bugzilla queue maintenance etc.

2010-09-03 Thread James Salsman

Platonides wrote:

... And if you want WMF to have its employee do X, the pay would be
 'I give Y money to WMF if they fix this first'? That seems a bit awkward.

It would be best to follow the pattern that the Red Cross uses, by offering
either where needed most as the default, or a handfull of alternative options:
http://american.redcross.org/site/PageServer?pagename=ntld_main

Jean-Marc van Leerdam wrote:

 Well, if you want to keep some control over destinations of the
 donations you could allow to earmark up to 50% of the donation

Yes, you could do that (with a footnote or similar disclaimer), and/or
associate a certain amount with some of the earmark options after
which those would be no longer available for selection (i.e., after
they were fully funded.)  Earmarking options could be offered in the
order they score as maximizing total giving, until the closed-ended
items with a maximum budget are fully funded.  (Each could have its
own goal thermometer shown.  After all the closed-ended earmarks are
satisfied, only the open-ended projects would remain in the order that
donors find them most inspiring.)

Platonides wrote:

 The idea of earmarking for minor donations is good, but it should
 not be readily available ... while not completely hidden, either.

Absolutely; a multivariate linear regression test to determine the
extent to which each of the earmark options tends to maximize total
contributions should be run in advance, with a sample size (assuming
30 earmark possibilities offered four at a time in a variety of
different languages and locales) of between 5000 and 30,000 donations.

The dependent variable would be total amount given, while the
independent variables should be the binary flag of whether the option
appeared in each test (donation.)  Here are some links to statistics
to help with multivariate linear regression:
http://www.statmethods.net/stats/regression.html

As for the earmarks, in addition to the five I suggested earlier, and
the ten approved but un-slotted Google Summer of Code projects, and
Sue has a list of 15 open-ended goals which could be used.  There is
ample opportunity to run more than just 30 earmarking options.  I'm
sure people could suggest others, either that they think of or find on
their favorite mailing lists or village pumps.

Best regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] fundraising earmarks for code review, image bundle dumps, search failover auction, Wikinews independence, bugzilla queue maintenance etc.

2010-09-03 Thread James Salsman

Ryan Kaldari wrote:

... There's definitely a lot of work that we need help with,
 so any assistance is appreciated!

What can I help with to prepare for experimental measurements?  Do you
already have a way to collect arbitrary radio button and checkmark
form responses from your PayPal donations? What format do those get
loged in?  I would love to write an R script for doing the regression.

Do you think this sort of thing would work better with radio buttons
like the Red Cross uses, or a set of checkmarks with language
specifying that the funds would be earmarked in equal proportions
between all the checked options -- or is that another independent
variable which should be tested?

Have you looked in to http://www.wepay.com?  They are supposed to be
offering a lower overhead rate than PayPal. I know you have an account
with moneybookers.com -- have you asked them all for a better deal
from each of them to get some competition between them going?

Aryeh Gregor wrote:

 On Fri, Sep 3, 2010 at 6:11 PM, James Salsman jsalsman at gmail.com wrote:
 Absolutely; a multivariate linear regression test to determine the
 extent to which each of the earmark options tends to maximize total
 contributions should be run in advance, with a sample size (assuming
 30 earmark possibilities offered four at a time in a variety of
 different languages and locales) of between 5000 and 30,000 donations.

Is that a practical number?

According to http://wikimediafoundation.org/wiki/Special:FundraiserStatistics
there were more than 3,600 contributions on the second day of the 2008
fundraiser. During the first few days the independent variables should
be presented in random permutations. After you've collected enough
data for your desired confidence level (I used 95%) then you can start
sorting them.  But if you want to use a lower level of confidence you
can vastly reduce the number of initial observations.  If you want to
use a 90% confidence level for 30 independent variables then you would
need less than 290 observations (donations.)

Ryan Kaldari wrote:

... There are a few potential problems with such a system:
 * More overhead for managing donations

In most cases, could this be addressed by disclaimers?  E.g., the
Foundation reserves the right to cancel earmarked projects for any
reason, and to override donor selections if funds fall short in any
essential areas?

 * The Foundation is trying to move away from any type of strings
 attached to donations (including grants) so that resources can be
 managed optimally and flexibly

If that presents an actual problem with small donations, under a
sufficiently flexible disclaimer, please let me know why.

 I do think, however, that such an earmarking system would make donating
 more attractive to some people

Isn't the reason that the Red Cross does it because it substantially
increases donations?  Rand Montoya said that he had measured that, and
although I forget the numbers, I remember that it was a very
significant difference.

David Gerard wrote:

 Unless the donation is really quite substantial, this may not be
 entirely worth the effort.

I know Rand said the effect was substantial, but it varies so much
with all of the different permutations that there is literally only
one way to find out the extent, and that is to measure it
experimentally with actual donors.  Merely discussing the
possibilities can not arrive at even a vague idea of how much the
presentation of each option serves to maximize donations.

Best regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Flash and other proprietary technologies on Wikimedia projects

2010-08-05 Thread James Salsman

Guillaume Paumier wrote:

 I don't think there is an official Board resolution about the use of
 proprietary technologies on Wikimedia projects. However, Brion and Erik
 have been known to have a pretty strong opinion on that, and I believe
 Danese and a large part of the WMF tech staff are in the same place.

 A few relevant links for a historical perspective:

 * We should permit Flash video playback thread on foundation-l in 2007
 http://thread.gmane.org/gmane.org.wikimedia.commons/2220/

 * Software policy draft thread on foundation-l in 2007
 http://thread.gmane.org/gmane.org.wikimedia.foundation/19547/

 * The actual draft:
 http://meta.wikimedia.org/wiki/Wikimedia_Draft_Statement_of_Principles_Regarding_Software_Use

There is nothing in that, or in
http://meta.wikimedia.org/wiki/File_format_policy which suggests that
we can't use Flash for microphone audio upload, is there?  Are people
aware of http://haxe.org/doc/intro and
http://www.gnu.org/software/gnash/ ? The bulk of Flash is no longer
proprietary.  I know there are patent issues around some flash video
formats, but at this point I have little confidence that any of the
major browser authors will provide HTML microphone upload in the next
five years.  Is there any reason to believe otherwise?

Casey Brown wrote:
 Another, somewhat more recent one:
 http://wikimediafoundation.org/wiki/Minutes/October_3-5,_2008#Open_Standards_.2F_Free_File_Formats

 The board asked Sue to have Mike Godwin revise the draft policy to a
 version that would make it clear that only free formats are
 permissible.

 Did that ever happen?  (Or did anything useful ever come about of it?)

Clearly not, so I am asking Sue and Mike directly by adding them as addressees.

I have been working on microphone audio upload since before the
previous decade: http://www.w3.org/TR/device-upload -- I have also
offered to donate some nice ActionScript microphone upload code to the
Foundation which compiles with Haxe if the builder is willing to do
such things as replace the Speex vocodec constant with the equivalent
integer. It doesn't run under gnash yet, but I believe it will soon.
(I don't think there would be consensus for dropping Wikimedia support
for closed-source browsers, as a related matter.)

In return, I have asked the Foundation to spend $2,500 on a contract
with Yaron Koren to enable GIFT -- http://microformats.org/wiki/gift
-- in the Quiz extension.  That would be particularly useful if the
efforts to ask the Open University to re-license the several thousand
hours of courseware which they currently publish under cc-by-nc-sa, to
cc-by-sa or cc-by succeed.  I have asked multiple parties, including
Board members and the UK Chapter to work on that simultaneously.  I
believe at least two of them are working on that effort.  In any case,
GIFT is far more compact and more wikitext-like than the existing Quiz
extension to Mediawiki which is bulky and suffers from lack of use in
more than 90 assessments on Wikiversity, for example, while GIFT
assessments can be produced from the assessments in any Moodle course
using Moodle's export function.

However, even though Wayne Mackintosh of the 25,000 teacher-strong
WikiEducator and OER Foundation wrote to Erik back on March 28, saying
they were very supportive of the GIFT compatibility project, Erik
has so far hesitated, saying that he wants to see additional support
from the community.

So please, if you think GIFT assessment support and/or Flash
microphone audio upload is a good idea (and I would repeat that the
Spanish Wiktonary still has no audio pronunciation for hola even
though the English Wiktionary does) then please let Erik know.  Thank
you!

Sincerely,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Take me back too hip

2010-07-20 Thread James Salsman

jida...@jidanni.org wrote:

 The first words the user* sees on every page are Take me back

I agree that link should be renamed with different text.  I've already
seen two people confuse it with the back button's functionality,
thinking they needed to click it after logging in to get back to the
page they had to log in to create.  For those users, who were never in
Monobook to begin with, they were taken back to somewhere they had
never been before, didn't know what to do when they got there, and
weren't very happy about either of those facts.

May I suggest, Use legacy interface or Abandon new interface?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] page load time statistics

2010-07-02 Thread James Salsman

Regarding http://meta.wikimedia.org/wiki/Talk:Statistics#Page_load_times
does anyone have any recommendations for open source alternatives to
tools such as http://loadimpact.com/pageanalyzer.php ?

Thank you.

Regards,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] image bundles' size?

2010-05-16 Thread James Salsman

How large would the projects' image bundles be uncompressed, if they
were to exist?

Also asked at:

http://meta.wikimedia.org/wiki/Talk:Data_dumps#How_big_would_image_bundles_be_if_they_existed

However, someone suggested I should be on wikitech-l more, so I
thought I would try asking here. I read here regularly, but I prefer
the dogfood. I promise to send the answer to the other place the
question was asked if someone else doesn't do so first.

This is for a mirror project, which I want to fork into a peer-to-peer
wiki system. A serious problem with peer-to-peer wikis is edit
conflict resolution -- most everything else about syncing is not as
hard as that in general. People often make the mistake of using git
as a metaphor, but code merges are much more tightly coupled with the
text being edited than most Foundation projects. But edit conflict
resolution is so hard in peer-to-peer mode; for example, the two
editors in conflict may be unavailable, and the person faced with the
conflict may not understand the original or either of the two edited
versions. We can use croudsourcing systems to, for example, have three
people try to resolve each nontrivial conflict, and three more people
to decide which of the first three was best; if there isn't
substantial agreement, get a fourth proposal in light of the first
three, etc. Can anyone think of a way to motivate volunteers to
resolve edit conflicts as a third party?

Sincerely,
James Salsman

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

46 matches

Mail list logo