[liberationtech] Data ethics workshop: call for participation

2014-05-22 Thread Robert Munro
Hi all

I am writing on behalf of the KDD Data Ethics Workshop (
http://dataethics.github.io), which I am helping organize on August 24 in
New York.

KDD (Knowledge Discovery and Data Mining) is one of the top data science
conferences and this year's theme is Data Mining for Social Good, so the
conference itself should be interesting to many liberation tech folk. The
conference is generally well-attended by data scientists and engineers from
the tech companies that are shaping our data policies, so there is a
potential for influence that is more direct than NGO and government-focused
events.

For the data ethics workshop, we are interested in both positive and
negative potential outcomes. Formats include position papers, case studies
and extended abstracts. Full call for participation below.

I hope many of you are able to take part!

best

Rob



CALL FOR PARTICIPATION

*** DATA ETHICS: KDD WORKSHOP ***
Sunday, August 24, 2014 in NYC
http://dataethics.github.io
SUBMISSION DEADLINE: June 8, 2014

Addressing a broad spectrum of ethical issues in data collection, storage,
analysis, and sharing, the Data Ethics workshop will be a forum to explore
data science's potential ethical implications -- both positive and negative
-- for data analytics practitioners and researchers in academia and
industry. Perspectives from the humanities and social sciences are welcome.

The workshop is held in conjunction with 20th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD'14) in New York City.

-
TOPICS
* Balancing transparency/openness vs. privacy/security
* Intentional and unintentional impact
* Balancing reward vs. risk of data usage
* Data use and re-use
* Theory vs. practice in data ethics
* Case studies of ethical issues that have arisen in data science
* Hippocratic Oath for Data Scientists
* Data ethics and the law
* Commercial / economic dimensions of data ethics
* Safe and effective structures for data philanthropy
* Data ownership vs. data as a public good
* What data can/should be collected in public
* Surveillance technologies: pros and cons
* Data anonymization/scrubbing, and data de-anonymization
* Cross-cultural differences in data ethics
* Human data processing and the ethics of microtasking/crowdsourcing
* Development of ethical norms and/or suggested checklists for data
practitioners

-
CALL FOR PARTICIPATION

In addition to traditional papers, submissions of  case studies, position
papers, posters, and extended abstract are also encouraged.

SUBMISSION DEADLINE: June 8, 2014

WORKSHOP: Sunday, August 24, 2014 in NYC

-
For more information or to submit your work, please visit:
http://dataethics.github.io



-- 
CEO, Idibon
www.robertmunro.com
www.idibon.com
-- 
Liberationtech is public  archives are searchable on Google. Violations of 
list guidelines will get you moderated: 
https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, 
change to digest, or change password by emailing moderator at 
compa...@stanford.edu.

Re: [liberationtech] SMS questions

2013-08-28 Thread Robert Munro
Take the advice *not* to use SMS. I'd also avoid any NGO software that
insists it was written for humanitarian purposes: this branding is
usually skin deep and they are often less secure than off-the-shelf
software. There are exceptions, like much of what Benetech produces,
but if you need to ask lists about security and you are working from
scratch on a tight timeline, like you say, then you are not in a
position to adequately evaluate the pros and cons.

If your main concern is that election monitoring reports are being
read by the local government while in transit via the phone networks,
then I would recommend Email rather than SMS, and have the reporters
use an email provider that defaults to SSL (like gmail).

This is assuming that you are not worried about the following things:
 1- the local government knowing about the *existence* of the system,
if not the content of every report.
 2- the identities of reporters being discovered.
 3- the implications of individual reporters and/or their devices in
the country being physically compromised.

If the security situation is critical enough that any of these three
points concerns you, then should probably avoid digital reporting
entirely, or find someone qualified in security to take the lead.
Otherwise, there's a good chance you'll just be helping the local
government identify their wanted dissidents, and ultimately do more
harm than good.

Rob

ps: Is the small far, far away country Luxembourg or Andorra?





On 28 August 2013 15:40, elijah eli...@riseup.net wrote:
 On 08/27/2013 09:36 AM, Richard Brooks wrote:

 I have colleagues living in a small country, far, far
 away with a history of rigged elections who want to
 put in place a system for collecting information
 using SMS. The local government keeps shutting
 down the systems that they put in place.

 As you probably know, the main solutions people use for this are
 Ushahidi or FrontlineSMS, but neither of these are secure enough for
 your needs, I think.

 FrontlineSMS has a good rundown of risks here:

 http://www.frontlinesms.com/wp-content/uploads/2011/08/frontlinesms_userguide.pdf

 Guardian created a fork of the Ushahidi android app to support encrypted
 transport, but it requires a data plan (and maybe isn't maintained?):

 https://guardianproject.info/2010/03/10/ushahidi-linda-testimony-protection/

 If you want secure reporting over SMS as the transport, I think your
 only option is moxie's TextSecure android app. This will not help in
 processing the reports, but it will allow the reports to be securely
 submitted. The government will still be able to identify and shut down
 this approach by identifying which devices are sending encrypted SMS
 messages or by blocking the number that reports are submitted to.

 The final option is to use SMS over satellite phones. Supposedly, this
 works very well, but is monstrously expensive.

 -elijah
 --
 Liberationtech is a public list whose archives are searchable on Google. 
 Violations of list guidelines will get you moderated: 
 https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, 
 change to digest, or change password by emailing moderator at 
 compa...@stanford.edu.



-- 
Idibon
www.idibon.com
www.robertmunro.com
-- 
Liberationtech is a public list whose archives are searchable on Google. 
Violations of list guidelines will get you moderated: 
https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, 
change to digest, or change password by emailing moderator at 
compa...@stanford.edu.


Re: [liberationtech] Opinion on a paper?

2012-09-09 Thread Robert Munro
I second the criticism about the assumptions of a 'perfect population
register'. This is a much broader problem, as shown by the Netflix
case. For a good synopsis, see Pete Warden's take on the problem, some
examples of how external data can be used to help reverse anonymized
data, and some suggestions for ways to operate with imperfect
anonymization:
  http://strata.oreilly.com/2011/05/anonymize-data-limits.html

You certainly don't need to be high-profile, either, like the article
suggests. Last year I was working on disease outbreak tracking. There
was an actual case where a girl in East Africa had been reported as
testing positive to Ebola. Her village was named in reports and this
was a region where victims of diseases are often vilified and
sometimes killed. She would have likely been the only person from her
village who was rushed to a hospital at that time (and more likely the
only girl of her age-bracket). It would have been simple for everyone
from her village to immediately make the connection. We decided we
would not want to publish this information, even though many other
health organizations did. Her diagnosis was ultimately incorrect,
which doesn't really affect the anonymization issue, but it makes any
identification/vilification even more disturbing.

We were information managers and health professionals, not lawyers,
and the international aspect no doubt complicates things. I assume
that the health organizations who did publicize this acted within the
law. For us, this wasn't enough. If it was reported in a health
journal 5 years later? That might be ok. But as real-time report it
was clearly unethical. I doubt the other organizations published this
in malice - it was one piece of information among many - but it
highlights the problem.

Rob








On 9 September 2012 15:30, Joss Wright
joss-liberationt...@pseudonymity.net wrote:
 On Sun, Sep 09, 2012 at 07:19:22PM +, Paul Bernal (LAW) wrote:

 I wondered if anyone had an opinion on it - I don't have the technical
 knowledge to be able to evaluate it properly. The basic conclusion
 seems to be that re-identification of 'anonymised' data is not nearly
 as easy as we had previously thought (from the work of Latanya
 Sweeney, Paul Ohm etc). Are these conclusions valid?

 My concern is that I can see this paper being used to justify all
 kinds of potentially risky information being released - particularly
 health data, which could get into the hands of insurance companies and
 others who could use it to the detriment of individuals. On the other
 hand, if the conclusions are really valid, then perhaps people like me
 shouldn't be as concerned as we are.

 Hi Paul,

 I've gone over this paper quite quickly, partially because it's late
 here and I should be asleep; apologies for any bizarre turns of phrase,
 repetition (hesitation or deviation...), or bad-tempered
 comments. :)

 I'll also certainly defer to the hardcore reidentification experts if
 they turn up.

 (This email has become slightly longer than I intended. To sum up:
 Lots of problems. False assumptions. Cherry-picked examples. Ignores or
 wholly misunderstands subsequent decade of research. Somewhat
 misrepresents statistics.  Wishful-thinking recommendations. Correct in
 stating that we don't need to delete all data everywhere in order to
 avoid reidentification, but that's about it.)

 My initial response is that the paper is partially correct, in that the
 Sweeney example was a dramatic, anecdotal demonstration of
 reidentification and shouldn't be taken as representative of data in
 general. On the other hand, the paper goes wildly off in the other
 direction, and claims that the specifics of the Sweeney example somehow
 demonstrate that reidentification in general is barely feasible and can
 easily be handled with a few simple rules of thumb.

 Overall, I would say that there are a number of serious flaws in the
 arguments of the author.

 Firstly, the paper is predicated almost entirely on what the author
 refers to as `the myth of the perfect population register' -- that
 almost no realistic database covers an entire population, and so any
 apparently unique record could in fact also match someone outside of the
 database. This is certainly true, but is used by the author to justify
 an assumption that does not hold, in my opinion.

 This assumption, the largest conceptual flaw in the paper, is that a
 reidentification has to be unique and perfect to be of any value. The
 author claims, based on the `perfect population register', that because
 some reidentified record, relating to, say, health information of an
 individual, could potentially match that of someone that wasn't in the
 database, that there is no guarantee that the record is accurate, and
 thus the reidentification is useless. This is not true -- even such
 partial or probabilistic reidentifications reduce the set of
 possibilities, and reveal information regarding an individual. This can
 be used and 

[liberationtech] First full report on the largest humanitarian crowdsourcing initiative to date

2012-06-04 Thread Robert Munro
The first full report about Mission 4636 Crowdsourcing and
Crisis-affect Community is now at:
   http://www.mission4636.org/report/
The page contains a link to the report which will be published in the
Journal of Information Retrieval, a summary of
findings/recommendations, and the comments from the Haitian community.

Mission 4636 was a predominantly Haitian initiative that I coordinated
in the wake of the 2010 earthquake in Haiti. It was the first time
that crowdsourcing (microtasking) had been used for humanitarian
response and is still the largest deployment of its kind -- larger
than the next 10 deployments combined.

In summary, the report has the following findings:

1. The greatest volume, speed and accuracy in information processing
was by Haitians and those working most closely with them.
2. Previous reports about Mission 4636 have incorrectly credited
international organizations with the majority of the work, often
inflating the 5% of data that went through the software of
international not-for-profits to look like 100% of the initiative.
3. No new technologies played a significant role in Mission 4636,
which is again contrary to most reports to date.
4. Crowdsourcing (microtasking) was an effective strategy to structure
and translate information into reports that the responders among the
US Military could act on.
5. The online chat was vital for information sharing, as no one person
could know all the possible locations and translations, but someone
among the collaborating volunteers often did.
6. Among social media platforms, Facebook was by far the most
important, which is contrary to most research on social media for
emergency management that has focused on Twitter.
7. Translation was the largest and most important information
processing task, followed by categorization and then geolocation and
structuring information about missing people.
8. The use of a public-facing ‘crisis map’ was opposed by the majority
of people within Mission 4636 and exposed the identities of at-risk
individuals.
9. The majority of volunteers came together through social media and
strong social ties.
10. A quarter of all crowdsourced information processing was by paid
workers within Haiti, who were one of the most vital workforces but
have also been excluded from most other reports to date.
11. The most important connections to the country were through the
volunteers themselves, with direct relationships to people managing
the clinics, radio stations, and individual people that we were
supporting.

From the findings in the report, the following recommendations are
made for organizations or individuals considering the use of
crowdsourcing in response to future disasters:

1. Find and manage volunteers via strong social ties.
2. Maintain a ten-to-one local-to-international workforce.
3. Default to private data practices.
4. Publish in the language of the crisis-affected community.
5. Do not elicit information for which there is not the capacity to respond.
6. Do not elicit emergency response communications.
7. Use social media to encourage the centralization of information.
8. Establish partnerships with technology companies.
9. Avoid partnerships with media organizations and citizen journalists.
10. Integrate, don’t innovate or disrupt.
11. Employ people with close ties to the crisis-affected region.

The majority of the report is an analysis of how the Mission 4636
volunteers and workers collaborated online to structure, filter and
share information among people within Haiti and among the response
community. The particular focus is on the diaspora, and the argument
is that the diaspora were the key to new methods of information
sharing during a crisis, not the technology they happened to be using.
Having said that, subscribers to this list might be interested to know
that among the small role played by international engineers, 90% of
the management was by Stanford alums.

Rob Munro



-- 
www.robertmunro.com
___
liberationtech mailing list
liberationtech@lists.stanford.edu

Should you need to change your subscription options, please go to:

https://mailman.stanford.edu/mailman/listinfo/liberationtech

If you would like to receive a daily digest, click yes (once you click above) 
next to would you like to receive list mail batched in a daily digest?

You will need the user name and password you receive from the list moderator in 
monthly reminders. You may ask for a reminder here: 
https://mailman.stanford.edu/mailman/listinfo/liberationtech

Should you need immediate assistance, please contact the list moderator.

Please don't forget to follow us on http://twitter.com/#!/Liberationtech