[liberationtech] Data ethics workshop: call for participation
Hi all I am writing on behalf of the KDD Data Ethics Workshop ( http://dataethics.github.io), which I am helping organize on August 24 in New York. KDD (Knowledge Discovery and Data Mining) is one of the top data science conferences and this year's theme is Data Mining for Social Good, so the conference itself should be interesting to many liberation tech folk. The conference is generally well-attended by data scientists and engineers from the tech companies that are shaping our data policies, so there is a potential for influence that is more direct than NGO and government-focused events. For the data ethics workshop, we are interested in both positive and negative potential outcomes. Formats include position papers, case studies and extended abstracts. Full call for participation below. I hope many of you are able to take part! best Rob CALL FOR PARTICIPATION *** DATA ETHICS: KDD WORKSHOP *** Sunday, August 24, 2014 in NYC http://dataethics.github.io SUBMISSION DEADLINE: June 8, 2014 Addressing a broad spectrum of ethical issues in data collection, storage, analysis, and sharing, the Data Ethics workshop will be a forum to explore data science's potential ethical implications -- both positive and negative -- for data analytics practitioners and researchers in academia and industry. Perspectives from the humanities and social sciences are welcome. The workshop is held in conjunction with 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14) in New York City. - TOPICS * Balancing transparency/openness vs. privacy/security * Intentional and unintentional impact * Balancing reward vs. risk of data usage * Data use and re-use * Theory vs. practice in data ethics * Case studies of ethical issues that have arisen in data science * Hippocratic Oath for Data Scientists * Data ethics and the law * Commercial / economic dimensions of data ethics * Safe and effective structures for data philanthropy * Data ownership vs. data as a public good * What data can/should be collected in public * Surveillance technologies: pros and cons * Data anonymization/scrubbing, and data de-anonymization * Cross-cultural differences in data ethics * Human data processing and the ethics of microtasking/crowdsourcing * Development of ethical norms and/or suggested checklists for data practitioners - CALL FOR PARTICIPATION In addition to traditional papers, submissions of case studies, position papers, posters, and extended abstract are also encouraged. SUBMISSION DEADLINE: June 8, 2014 WORKSHOP: Sunday, August 24, 2014 in NYC - For more information or to submit your work, please visit: http://dataethics.github.io -- CEO, Idibon www.robertmunro.com www.idibon.com -- Liberationtech is public archives are searchable on Google. Violations of list guidelines will get you moderated: https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, change to digest, or change password by emailing moderator at compa...@stanford.edu.
Re: [liberationtech] SMS questions
Take the advice *not* to use SMS. I'd also avoid any NGO software that insists it was written for humanitarian purposes: this branding is usually skin deep and they are often less secure than off-the-shelf software. There are exceptions, like much of what Benetech produces, but if you need to ask lists about security and you are working from scratch on a tight timeline, like you say, then you are not in a position to adequately evaluate the pros and cons. If your main concern is that election monitoring reports are being read by the local government while in transit via the phone networks, then I would recommend Email rather than SMS, and have the reporters use an email provider that defaults to SSL (like gmail). This is assuming that you are not worried about the following things: 1- the local government knowing about the *existence* of the system, if not the content of every report. 2- the identities of reporters being discovered. 3- the implications of individual reporters and/or their devices in the country being physically compromised. If the security situation is critical enough that any of these three points concerns you, then should probably avoid digital reporting entirely, or find someone qualified in security to take the lead. Otherwise, there's a good chance you'll just be helping the local government identify their wanted dissidents, and ultimately do more harm than good. Rob ps: Is the small far, far away country Luxembourg or Andorra? On 28 August 2013 15:40, elijah eli...@riseup.net wrote: On 08/27/2013 09:36 AM, Richard Brooks wrote: I have colleagues living in a small country, far, far away with a history of rigged elections who want to put in place a system for collecting information using SMS. The local government keeps shutting down the systems that they put in place. As you probably know, the main solutions people use for this are Ushahidi or FrontlineSMS, but neither of these are secure enough for your needs, I think. FrontlineSMS has a good rundown of risks here: http://www.frontlinesms.com/wp-content/uploads/2011/08/frontlinesms_userguide.pdf Guardian created a fork of the Ushahidi android app to support encrypted transport, but it requires a data plan (and maybe isn't maintained?): https://guardianproject.info/2010/03/10/ushahidi-linda-testimony-protection/ If you want secure reporting over SMS as the transport, I think your only option is moxie's TextSecure android app. This will not help in processing the reports, but it will allow the reports to be securely submitted. The government will still be able to identify and shut down this approach by identifying which devices are sending encrypted SMS messages or by blocking the number that reports are submitted to. The final option is to use SMS over satellite phones. Supposedly, this works very well, but is monstrously expensive. -elijah -- Liberationtech is a public list whose archives are searchable on Google. Violations of list guidelines will get you moderated: https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, change to digest, or change password by emailing moderator at compa...@stanford.edu. -- Idibon www.idibon.com www.robertmunro.com -- Liberationtech is a public list whose archives are searchable on Google. Violations of list guidelines will get you moderated: https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, change to digest, or change password by emailing moderator at compa...@stanford.edu.
Re: [liberationtech] Opinion on a paper?
I second the criticism about the assumptions of a 'perfect population register'. This is a much broader problem, as shown by the Netflix case. For a good synopsis, see Pete Warden's take on the problem, some examples of how external data can be used to help reverse anonymized data, and some suggestions for ways to operate with imperfect anonymization: http://strata.oreilly.com/2011/05/anonymize-data-limits.html You certainly don't need to be high-profile, either, like the article suggests. Last year I was working on disease outbreak tracking. There was an actual case where a girl in East Africa had been reported as testing positive to Ebola. Her village was named in reports and this was a region where victims of diseases are often vilified and sometimes killed. She would have likely been the only person from her village who was rushed to a hospital at that time (and more likely the only girl of her age-bracket). It would have been simple for everyone from her village to immediately make the connection. We decided we would not want to publish this information, even though many other health organizations did. Her diagnosis was ultimately incorrect, which doesn't really affect the anonymization issue, but it makes any identification/vilification even more disturbing. We were information managers and health professionals, not lawyers, and the international aspect no doubt complicates things. I assume that the health organizations who did publicize this acted within the law. For us, this wasn't enough. If it was reported in a health journal 5 years later? That might be ok. But as real-time report it was clearly unethical. I doubt the other organizations published this in malice - it was one piece of information among many - but it highlights the problem. Rob On 9 September 2012 15:30, Joss Wright joss-liberationt...@pseudonymity.net wrote: On Sun, Sep 09, 2012 at 07:19:22PM +, Paul Bernal (LAW) wrote: I wondered if anyone had an opinion on it - I don't have the technical knowledge to be able to evaluate it properly. The basic conclusion seems to be that re-identification of 'anonymised' data is not nearly as easy as we had previously thought (from the work of Latanya Sweeney, Paul Ohm etc). Are these conclusions valid? My concern is that I can see this paper being used to justify all kinds of potentially risky information being released - particularly health data, which could get into the hands of insurance companies and others who could use it to the detriment of individuals. On the other hand, if the conclusions are really valid, then perhaps people like me shouldn't be as concerned as we are. Hi Paul, I've gone over this paper quite quickly, partially because it's late here and I should be asleep; apologies for any bizarre turns of phrase, repetition (hesitation or deviation...), or bad-tempered comments. :) I'll also certainly defer to the hardcore reidentification experts if they turn up. (This email has become slightly longer than I intended. To sum up: Lots of problems. False assumptions. Cherry-picked examples. Ignores or wholly misunderstands subsequent decade of research. Somewhat misrepresents statistics. Wishful-thinking recommendations. Correct in stating that we don't need to delete all data everywhere in order to avoid reidentification, but that's about it.) My initial response is that the paper is partially correct, in that the Sweeney example was a dramatic, anecdotal demonstration of reidentification and shouldn't be taken as representative of data in general. On the other hand, the paper goes wildly off in the other direction, and claims that the specifics of the Sweeney example somehow demonstrate that reidentification in general is barely feasible and can easily be handled with a few simple rules of thumb. Overall, I would say that there are a number of serious flaws in the arguments of the author. Firstly, the paper is predicated almost entirely on what the author refers to as `the myth of the perfect population register' -- that almost no realistic database covers an entire population, and so any apparently unique record could in fact also match someone outside of the database. This is certainly true, but is used by the author to justify an assumption that does not hold, in my opinion. This assumption, the largest conceptual flaw in the paper, is that a reidentification has to be unique and perfect to be of any value. The author claims, based on the `perfect population register', that because some reidentified record, relating to, say, health information of an individual, could potentially match that of someone that wasn't in the database, that there is no guarantee that the record is accurate, and thus the reidentification is useless. This is not true -- even such partial or probabilistic reidentifications reduce the set of possibilities, and reveal information regarding an individual. This can be used and
[liberationtech] First full report on the largest humanitarian crowdsourcing initiative to date
The first full report about Mission 4636 Crowdsourcing and Crisis-affect Community is now at: http://www.mission4636.org/report/ The page contains a link to the report which will be published in the Journal of Information Retrieval, a summary of findings/recommendations, and the comments from the Haitian community. Mission 4636 was a predominantly Haitian initiative that I coordinated in the wake of the 2010 earthquake in Haiti. It was the first time that crowdsourcing (microtasking) had been used for humanitarian response and is still the largest deployment of its kind -- larger than the next 10 deployments combined. In summary, the report has the following findings: 1. The greatest volume, speed and accuracy in information processing was by Haitians and those working most closely with them. 2. Previous reports about Mission 4636 have incorrectly credited international organizations with the majority of the work, often inflating the 5% of data that went through the software of international not-for-profits to look like 100% of the initiative. 3. No new technologies played a significant role in Mission 4636, which is again contrary to most reports to date. 4. Crowdsourcing (microtasking) was an effective strategy to structure and translate information into reports that the responders among the US Military could act on. 5. The online chat was vital for information sharing, as no one person could know all the possible locations and translations, but someone among the collaborating volunteers often did. 6. Among social media platforms, Facebook was by far the most important, which is contrary to most research on social media for emergency management that has focused on Twitter. 7. Translation was the largest and most important information processing task, followed by categorization and then geolocation and structuring information about missing people. 8. The use of a public-facing ‘crisis map’ was opposed by the majority of people within Mission 4636 and exposed the identities of at-risk individuals. 9. The majority of volunteers came together through social media and strong social ties. 10. A quarter of all crowdsourced information processing was by paid workers within Haiti, who were one of the most vital workforces but have also been excluded from most other reports to date. 11. The most important connections to the country were through the volunteers themselves, with direct relationships to people managing the clinics, radio stations, and individual people that we were supporting. From the findings in the report, the following recommendations are made for organizations or individuals considering the use of crowdsourcing in response to future disasters: 1. Find and manage volunteers via strong social ties. 2. Maintain a ten-to-one local-to-international workforce. 3. Default to private data practices. 4. Publish in the language of the crisis-affected community. 5. Do not elicit information for which there is not the capacity to respond. 6. Do not elicit emergency response communications. 7. Use social media to encourage the centralization of information. 8. Establish partnerships with technology companies. 9. Avoid partnerships with media organizations and citizen journalists. 10. Integrate, don’t innovate or disrupt. 11. Employ people with close ties to the crisis-affected region. The majority of the report is an analysis of how the Mission 4636 volunteers and workers collaborated online to structure, filter and share information among people within Haiti and among the response community. The particular focus is on the diaspora, and the argument is that the diaspora were the key to new methods of information sharing during a crisis, not the technology they happened to be using. Having said that, subscribers to this list might be interested to know that among the small role played by international engineers, 90% of the management was by Stanford alums. Rob Munro -- www.robertmunro.com ___ liberationtech mailing list liberationtech@lists.stanford.edu Should you need to change your subscription options, please go to: https://mailman.stanford.edu/mailman/listinfo/liberationtech If you would like to receive a daily digest, click yes (once you click above) next to would you like to receive list mail batched in a daily digest? You will need the user name and password you receive from the list moderator in monthly reminders. You may ask for a reminder here: https://mailman.stanford.edu/mailman/listinfo/liberationtech Should you need immediate assistance, please contact the list moderator. Please don't forget to follow us on http://twitter.com/#!/Liberationtech