Re: [datameet] Security Issues with the Voter List
Something I read today: http://www.medianama.com/2014/05/223-modak-marketing-election-voter-india/ -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [datameet] Security Issues with the Voter List
Dear Gautam, Thank you. This is very interesting. I wrote a piece on this issue right after the failed Google-ECI deal in February http://goo.gl/e9Xea0 The UK approach seems to be a good one. In UK there are two voter lists - full list and edited list. You can choose to be removed from the edited list during the time of registration or at anytime thereafter. The edited list is available in the public domain and the full list is safeguarded by purpose limitation and UK Data Protection Law. ~Snehashish On Mon, May 19, 2014 at 10:36 AM, Gautam John gkj...@gmail.com wrote: Something I read today: http://www.medianama.com/2014/05/223-modak-marketing-election-voter-india/ -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [datameet] Security Issues with the Voter List
As a follow-up to this discussion: electoralsearch.in began to implement rate limiting and selective IP blocking yesterday. Sad as this is for my own research purposes, I welcome the step from a privacy point of view... Raphael On 11.04.2014 10:56, Chandrashekhar Raman wrote: Raphael, To clarify, i am not trying to make a case against availability of fine grained data, far from it i'm with you on this argument among others that are made spuriously to restrict access. I might have stretched the point but then again - killing is just one extreme form of discrimination - there are others that are less visible you summed it up very well, its good to have a healthy caution and unease when dealing with some of this data,there are probably no simple answers here. will read the paper at leisure. cs. On Fri, Apr 11, 2014 at 12:37 PM, Raphael Susewind li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote: Chandrashekhar, just on the specific issues of targeting communities, which I have thought about a great deal (my first book was on post-2002 Gujarat), my tentative conclusion is this: The fact that electoral rolls had been used in the past in riots before they were available online shows that rioters, if they want to, can access this data already. As Gautam pointed out, it IS public by law. What changes is merely the scale of data availability. Large-scale data would only be 'more useful' for large-scale targeting, however (small-scale targeting is possible already), which I don't see happening at this time (with the troublesome exception of Gujarat, particularly troublesome now that Mr Modi runs for PM - but here, too, the targeting happened in small units on the ground, even though coordination took place higher up). On the other hand, fine-grained large-scale data is absolutely necessary to understand a range of issues about (religious, caste) economic position. So that in this specific case, we have additional benefits but no additional risk (beyond the worrisome risk already out there)... More detailed arguments about this in a forthcoming paper of mine at http://pub.uni-bielefeld.de/publication/2631138 Best, Raphael On 11.04.2014 08:49, Chandrashekhar Raman wrote: Raphael, you raise very pertinent issues. We as a community love open data and in this country there is a lot that can be done to free all kinds of data so that it can be made use of in a good way (election data in an aggregated form is one example). But at the same time there are certain kinds of data which are not open ( i mean not open in a machine readable format) for a good reason. I believe voter rolls data is one such type. In the past voter lists have been used to pinpoint members of specific communities which were then targeted with gruesome effect. Shudder to think what happens if it is automated, a 'riot app'? As Raphael points out this is not just about privacy, but could be much worse. This group is a fantastic initiative and as it evolves, it would be great for us to involve more social scientists and policy experts - so as we advocate vociferously to free more data and make it open - we can also bring in the technical expertise here to recommend where data needs to be better protected and how. cs On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind li...@raphael-susewind.de mailto:li...@raphael-susewind.de mailto:li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote: Hi Devdatta and Avinash, yes, I, too, am frankly surprised at the ease with which one can access sensitive data in bulk. Not only PDF rolls and voter details, but also things such as land records, BPL lists, and much more - I think we are in an exciting as well as dangerous phase of fairly uncontrolled, nascent e-Governance practices. But I think the ethical issues here are a little more complex than mere privacy concern. Upfront, I must admit that I use all the above sources for academic research (in UP and across India). What Avinash described in principle and at the example of Delhi can indeed be done on an all-India scale, and I am sure there are more people than just me who do it. But then the social sciences have long dealt with sensitive data and developed protocols to protect it. Even though the data is publicly available, I for instance have my own copy on a secure workstation with full disk encryption and two factor authentication. Whenever possible, I also work
Re: [datameet] Security Issues with the Voter List
Hi Devdatta and Avinash, yes, I, too, am frankly surprised at the ease with which one can access sensitive data in bulk. Not only PDF rolls and voter details, but also things such as land records, BPL lists, and much more - I think we are in an exciting as well as dangerous phase of fairly uncontrolled, nascent e-Governance practices. But I think the ethical issues here are a little more complex than mere privacy concern. Upfront, I must admit that I use all the above sources for academic research (in UP and across India). What Avinash described in principle and at the example of Delhi can indeed be done on an all-India scale, and I am sure there are more people than just me who do it. But then the social sciences have long dealt with sensitive data and developed protocols to protect it. Even though the data is publicly available, I for instance have my own copy on a secure workstation with full disk encryption and two factor authentication. Whenever possible, I also work on anonymized subsets of data. Yet there are other potential uses - some of the more worrisome you pointed out - which are not bound by such data protection standards. To me, this once more highlights the nascent stage of ethical standards around Big Data and eGovernance. On the plus side, I am happy to have that kind of access to conduct research which will ultimately be ethically beneficial, leading to better understanding of social issues and potentially to better policy advice. Also, there is a point to be made that transparency is an important asset in elections in particular, not only in terms of individual electoral search functions, but also in terms of publicly accessible (and cross-checkable, publicly verifiable) PDF rolls. Finally, a lot of this data had been available in the past as well, only in distributed and/or commercial form, which means there had been a hierarchy of access: small-time crooks could not use it, but large-time crooks were always able to use it. Likewise, scholars at large (often foreign) universities were able to use it, but not smaller ones (this is still true for some data, geodata in particular, which I can only access because of Ivy-League contacts and only process because of an association with Oxford University). The ethical challenge as I see it thus comes not from data availability per se, but from the bulk accessibility and processability of data, as well as the potential to link otherwise disconnected datasets with each other (for instance a voter ID from the rolls to the online electoral search mechanism to that voter's polling booth locality to the ration card of a person with the same name registered at a ration shop in close spatial proximity to the amount of rice that person obtained last week, all coupled - in case of my own research - to that person's religious identity through a namematching algorithm). And this IS an ethical challenge indeed, particularly if one leaves the ivory tower of academia, where ethical standards for such data are more ingrained, and more adhered to. One need not go all the way to the various criminal uses of such data - are we all happy with commercial use, to start with? I have no easy answers here, because I think the ethical issue is fairly complex, balancing privacy and personal security against transparency in the political process and legitimate academic use of data (also because I think the answer must be found in India through political deliberation, and not in German academia). Still, in the end, I have to admit that I often leave my desk in the evening with quite some unease over the sheer wealth of private data that I work with... What do others think? Raphael On 11.04.2014 06:57, Avinash Celestine wrote: Hi Devdatta Yes, though (and in the current context, i suppose thats a good thing), its not so easy for some other states such as UP, due to certain problems with the way the pdfs are encoded. Raphael, who is on this group, will testify to that... I had alluded to this sometime back... https://storify.com/ac_soc/voter-profiling Avinash On Fri, Apr 11, 2014 at 9:55 AM, Devdatta Tengshe devda...@tengshe.in mailto:devda...@tengshe.in wrote: Hi, I found this interesting article by a guy who downloaded and processed the Voter list of Delhi:https://medium.com/p/1aff55526881 https://medium.com/p/1aff55526881 I found this via a discussion on Reddit: http://www.reddit.com/r/programming/comments/22pn8u/i_wrote_a_few_simple_python_scripts_to_retrieve/ I'll like to quote his findings here: 1. It is possible to automate the retrieval of every single PDF roll all across India 2. These PDFs can then be processed in a matter of minutes to produce details like Addresses, names, father's name, gender, age and voters ID number for every single registered voter of India 3. Nearly 25% of the Voter IDs assigned within only Delhi fail to
Re: [datameet] Security Issues with the Voter List
Leaving aside my earlier comment as perhaps tongue in cheek, the electoral rolls are *meant* to be public. The Registration of Electors Rules, 1960 makes that clear. However, your larger point is well made. Maybe what needs to be done is to *de-centralise* the storage? That fulfils the requirements of the Registration of Electors Rules, 1960 and making it harder to something like this. It says: As soon as the roll for a constituency is ready, the registration officer shall publish it in draft by making a copy thereof available for inspection and displaying a notice in Form 5-- (a) at his office, if it is within the constituency, and (b) at such place in the constituency as may be specified by him for the purpose, if his office is outside the constituency ; [or in the official website of the Chief Electoral Officer of the concerned State:] [Provided that where such draft contains names of overseas electors, the copies of such rolls shall also be published in the Electronic Gazette 6 [or in the official website of the Chief Electoral Officer of the concerned State].] The Representation of the People Act, 1951 contains this: The Government shall, at any election to be held for the purposes of constituting the House of the People or the Legislative Assembly of a State, supply, free of cost, to the candidates of recognised political parties such number of copies of the electoral roll, as finally published ... Worth asking if we want political parties to have free access to it but not citizens. People Act, 1950 (43 of 1950) -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [datameet] Security Issues with the Voter List
Raphael, you raise very pertinent issues. We as a community love open data and in this country there is a lot that can be done to free all kinds of data so that it can be made use of in a good way (election data in an aggregated form is one example). But at the same time there are certain kinds of data which are not open ( i mean not open in a machine readable format) for a good reason. I believe voter rolls data is one such type. In the past voter lists have been used to pinpoint members of specific communities which were then targeted with gruesome effect. Shudder to think what happens if it is automated, a 'riot app'? As Raphael points out this is not just about privacy, but could be much worse. This group is a fantastic initiative and as it evolves, it would be great for us to involve more social scientists and policy experts - so as we advocate vociferously to free more data and make it open - we can also bring in the technical expertise here to recommend where data needs to be better protected and how. cs On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind li...@raphael-susewind.de wrote: Hi Devdatta and Avinash, yes, I, too, am frankly surprised at the ease with which one can access sensitive data in bulk. Not only PDF rolls and voter details, but also things such as land records, BPL lists, and much more - I think we are in an exciting as well as dangerous phase of fairly uncontrolled, nascent e-Governance practices. But I think the ethical issues here are a little more complex than mere privacy concern. Upfront, I must admit that I use all the above sources for academic research (in UP and across India). What Avinash described in principle and at the example of Delhi can indeed be done on an all-India scale, and I am sure there are more people than just me who do it. But then the social sciences have long dealt with sensitive data and developed protocols to protect it. Even though the data is publicly available, I for instance have my own copy on a secure workstation with full disk encryption and two factor authentication. Whenever possible, I also work on anonymized subsets of data. Yet there are other potential uses - some of the more worrisome you pointed out - which are not bound by such data protection standards. To me, this once more highlights the nascent stage of ethical standards around Big Data and eGovernance. On the plus side, I am happy to have that kind of access to conduct research which will ultimately be ethically beneficial, leading to better understanding of social issues and potentially to better policy advice. Also, there is a point to be made that transparency is an important asset in elections in particular, not only in terms of individual electoral search functions, but also in terms of publicly accessible (and cross-checkable, publicly verifiable) PDF rolls. Finally, a lot of this data had been available in the past as well, only in distributed and/or commercial form, which means there had been a hierarchy of access: small-time crooks could not use it, but large-time crooks were always able to use it. Likewise, scholars at large (often foreign) universities were able to use it, but not smaller ones (this is still true for some data, geodata in particular, which I can only access because of Ivy-League contacts and only process because of an association with Oxford University). The ethical challenge as I see it thus comes not from data availability per se, but from the bulk accessibility and processability of data, as well as the potential to link otherwise disconnected datasets with each other (for instance a voter ID from the rolls to the online electoral search mechanism to that voter's polling booth locality to the ration card of a person with the same name registered at a ration shop in close spatial proximity to the amount of rice that person obtained last week, all coupled - in case of my own research - to that person's religious identity through a namematching algorithm). And this IS an ethical challenge indeed, particularly if one leaves the ivory tower of academia, where ethical standards for such data are more ingrained, and more adhered to. One need not go all the way to the various criminal uses of such data - are we all happy with commercial use, to start with? I have no easy answers here, because I think the ethical issue is fairly complex, balancing privacy and personal security against transparency in the political process and legitimate academic use of data (also because I think the answer must be found in India through political deliberation, and not in German academia). Still, in the end, I have to admit that I often leave my desk in the evening with quite some unease over the sheer wealth of private data that I work with... What do others think? Raphael On 11.04.2014 06:57, Avinash Celestine wrote: Hi Devdatta Yes, though (and in the current context, i suppose thats a good
Re: [datameet] Security Issues with the Voter List
Hi Gautam I dont think the issue is with having the electoral roll available publicly per se. personally, i think its better that the rolls are available in the open, as compared with the alternative, where it is confidential, thus leaving it open to other types of abuses. But i do think that certain minimum safeguards should be in place - even something as simple as a captcha code (and mentioned in the link which started off this thread), to deter heavy bulk downloading...it seems to me the bare minimum. Now, will this stop me from searching for someone specific within the voters list that i want to target, given that i have a rough idea of where they live? certainly not. Coupled with this is the irony, that other datasets for which there is absolutely no reason for secrecy (atleast i cant conceive of a reason for it - maybe its pure bureaucracy), are extremely difficult to get. Case in point is any official version of the PC, AC shapefiles which Raphael and others on this group have been trying so hard to create. Raphael is right - these are complex issues. And we have barely begun to scratch the surface of what should be done. Interestingly, in the reddit thread linked above, there are references to the fact that New York or Sweden too provide vast amounts of personal information for little or no fee... Avinash On Fri, Apr 11, 2014 at 11:57 AM, Gautam John gkj...@gmail.com wrote: Leaving aside my earlier comment as perhaps tongue in cheek, the electoral rolls are *meant* to be public. The Registration of Electors Rules, 1960 makes that clear. However, your larger point is well made. Maybe what needs to be done is to *de-centralise* the storage? That fulfils the requirements of the Registration of Electors Rules, 1960 and making it harder to something like this. It says: As soon as the roll for a constituency is ready, the registration officer shall publish it in draft by making a copy thereof available for inspection and displaying a notice in Form 5-- (a) at his office, if it is within the constituency, and (b) at such place in the constituency as may be specified by him for the purpose, if his office is outside the constituency ; [or in the official website of the Chief Electoral Officer of the concerned State:] [Provided that where such draft contains names of overseas electors, the copies of such rolls shall also be published in the Electronic Gazette 6 [or in the official website of the Chief Electoral Officer of the concerned State].] The Representation of the People Act, 1951 contains this: The Government shall, at any election to be held for the purposes of constituting the House of the People or the Legislative Assembly of a State, supply, free of cost, to the candidates of recognised political parties such number of copies of the electoral roll, as finally published ... Worth asking if we want political parties to have free access to it but not citizens. People Act, 1950 (43 of 1950) -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [datameet] Security Issues with the Voter List
Chandrashekhar, just on the specific issues of targeting communities, which I have thought about a great deal (my first book was on post-2002 Gujarat), my tentative conclusion is this: The fact that electoral rolls had been used in the past in riots before they were available online shows that rioters, if they want to, can access this data already. As Gautam pointed out, it IS public by law. What changes is merely the scale of data availability. Large-scale data would only be 'more useful' for large-scale targeting, however (small-scale targeting is possible already), which I don't see happening at this time (with the troublesome exception of Gujarat, particularly troublesome now that Mr Modi runs for PM - but here, too, the targeting happened in small units on the ground, even though coordination took place higher up). On the other hand, fine-grained large-scale data is absolutely necessary to understand a range of issues about (religious, caste) economic position. So that in this specific case, we have additional benefits but no additional risk (beyond the worrisome risk already out there)... More detailed arguments about this in a forthcoming paper of mine at http://pub.uni-bielefeld.de/publication/2631138 Best, Raphael On 11.04.2014 08:49, Chandrashekhar Raman wrote: Raphael, you raise very pertinent issues. We as a community love open data and in this country there is a lot that can be done to free all kinds of data so that it can be made use of in a good way (election data in an aggregated form is one example). But at the same time there are certain kinds of data which are not open ( i mean not open in a machine readable format) for a good reason. I believe voter rolls data is one such type. In the past voter lists have been used to pinpoint members of specific communities which were then targeted with gruesome effect. Shudder to think what happens if it is automated, a 'riot app'? As Raphael points out this is not just about privacy, but could be much worse. This group is a fantastic initiative and as it evolves, it would be great for us to involve more social scientists and policy experts - so as we advocate vociferously to free more data and make it open - we can also bring in the technical expertise here to recommend where data needs to be better protected and how. cs On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote: Hi Devdatta and Avinash, yes, I, too, am frankly surprised at the ease with which one can access sensitive data in bulk. Not only PDF rolls and voter details, but also things such as land records, BPL lists, and much more - I think we are in an exciting as well as dangerous phase of fairly uncontrolled, nascent e-Governance practices. But I think the ethical issues here are a little more complex than mere privacy concern. Upfront, I must admit that I use all the above sources for academic research (in UP and across India). What Avinash described in principle and at the example of Delhi can indeed be done on an all-India scale, and I am sure there are more people than just me who do it. But then the social sciences have long dealt with sensitive data and developed protocols to protect it. Even though the data is publicly available, I for instance have my own copy on a secure workstation with full disk encryption and two factor authentication. Whenever possible, I also work on anonymized subsets of data. Yet there are other potential uses - some of the more worrisome you pointed out - which are not bound by such data protection standards. To me, this once more highlights the nascent stage of ethical standards around Big Data and eGovernance. On the plus side, I am happy to have that kind of access to conduct research which will ultimately be ethically beneficial, leading to better understanding of social issues and potentially to better policy advice. Also, there is a point to be made that transparency is an important asset in elections in particular, not only in terms of individual electoral search functions, but also in terms of publicly accessible (and cross-checkable, publicly verifiable) PDF rolls. Finally, a lot of this data had been available in the past as well, only in distributed and/or commercial form, which means there had been a hierarchy of access: small-time crooks could not use it, but large-time crooks were always able to use it. Likewise, scholars at large (often foreign) universities were able to use it, but not smaller ones (this is still true for some data, geodata in particular, which I can only access because of Ivy-League contacts and only process because of an association with Oxford University). The ethical challenge as I see it thus comes not from data availability
Re: [datameet] Security Issues with the Voter List
Raphael, To clarify, i am not trying to make a case against availability of fine grained data, far from it i'm with you on this argument among others that are made spuriously to restrict access. I might have stretched the point but then again - killing is just one extreme form of discrimination - there are others that are less visible you summed it up very well, its good to have a healthy caution and unease when dealing with some of this data,there are probably no simple answers here. will read the paper at leisure. cs. On Fri, Apr 11, 2014 at 12:37 PM, Raphael Susewind li...@raphael-susewind.de wrote: Chandrashekhar, just on the specific issues of targeting communities, which I have thought about a great deal (my first book was on post-2002 Gujarat), my tentative conclusion is this: The fact that electoral rolls had been used in the past in riots before they were available online shows that rioters, if they want to, can access this data already. As Gautam pointed out, it IS public by law. What changes is merely the scale of data availability. Large-scale data would only be 'more useful' for large-scale targeting, however (small-scale targeting is possible already), which I don't see happening at this time (with the troublesome exception of Gujarat, particularly troublesome now that Mr Modi runs for PM - but here, too, the targeting happened in small units on the ground, even though coordination took place higher up). On the other hand, fine-grained large-scale data is absolutely necessary to understand a range of issues about (religious, caste) economic position. So that in this specific case, we have additional benefits but no additional risk (beyond the worrisome risk already out there)... More detailed arguments about this in a forthcoming paper of mine at http://pub.uni-bielefeld.de/publication/2631138 Best, Raphael On 11.04.2014 08:49, Chandrashekhar Raman wrote: Raphael, you raise very pertinent issues. We as a community love open data and in this country there is a lot that can be done to free all kinds of data so that it can be made use of in a good way (election data in an aggregated form is one example). But at the same time there are certain kinds of data which are not open ( i mean not open in a machine readable format) for a good reason. I believe voter rolls data is one such type. In the past voter lists have been used to pinpoint members of specific communities which were then targeted with gruesome effect. Shudder to think what happens if it is automated, a 'riot app'? As Raphael points out this is not just about privacy, but could be much worse. This group is a fantastic initiative and as it evolves, it would be great for us to involve more social scientists and policy experts - so as we advocate vociferously to free more data and make it open - we can also bring in the technical expertise here to recommend where data needs to be better protected and how. cs On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote: Hi Devdatta and Avinash, yes, I, too, am frankly surprised at the ease with which one can access sensitive data in bulk. Not only PDF rolls and voter details, but also things such as land records, BPL lists, and much more - I think we are in an exciting as well as dangerous phase of fairly uncontrolled, nascent e-Governance practices. But I think the ethical issues here are a little more complex than mere privacy concern. Upfront, I must admit that I use all the above sources for academic research (in UP and across India). What Avinash described in principle and at the example of Delhi can indeed be done on an all-India scale, and I am sure there are more people than just me who do it. But then the social sciences have long dealt with sensitive data and developed protocols to protect it. Even though the data is publicly available, I for instance have my own copy on a secure workstation with full disk encryption and two factor authentication. Whenever possible, I also work on anonymized subsets of data. Yet there are other potential uses - some of the more worrisome you pointed out - which are not bound by such data protection standards. To me, this once more highlights the nascent stage of ethical standards around Big Data and eGovernance. On the plus side, I am happy to have that kind of access to conduct research which will ultimately be ethically beneficial, leading to better understanding of social issues and potentially to better policy advice. Also, there is a point to be made that transparency is an important asset in elections in particular, not only in terms of individual electoral search functions, but also in terms of publicly accessible (and
Re: [datameet] Security Issues with the Voter List
Not sure this is a flaw. Maybe it's a feature? :D -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.