Re: [datameet] Parliamentary constituency boundaries 2019

2019-03-26 Thread Raphael Susewind
Dear all,

just a warning that my polling booth locality data is quite useless for
2019 in most states - booth IDs will have changed... For UP I might get
around to update them but not currently for other states. So don't use
this data to map the ongoing elections!

You have been warned ;-)
Raphael

On 3/25/19 2:12 PM, Arun Ganesh wrote:
> Hey all, i've made a map of the electoral boundaries and polling booths
> using available data: Electoral Map of India
> <https://api.mapbox.com/styles/v1/planemad/cjoescdh20cl62spey0zj3v19.html?fresh=true=true_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#4.56/22.34/75.08>
> 
> Data sources:
> - Assembly constituency boundaries: Datameet
> https://github.com/datameet/maps/tree/master/assembly-constituencies
> - Polling booths: Raphael Susewind
> (2014) https://pub.uni-bielefeld.de/data/2674065
> 
> Have used the assembly boundaries as they had a higher definition the
> the parliamentary boundaries. When you zoom in the PC name should be
> visible.
> 
> Is there a more recent dataset of the polling booths available anywhere?
> Found scraped data from the ECI site  at
> https://github.com/aaronrudkin/IndianPollingStations from 5 months ago,
> but looks like the latlong values were not scraped and has been geocoded
> to the town centre which are not very useful. The scraper also does not
> seem to work anymore.
> 
> On Tue, Mar 19, 2019 at 7:43 AM Avinash Celestine
> mailto:avinash.celest...@gmail.com>> wrote:
> 
> Constituency boundaries were last delimited in 2008, and have not
> changed since.
> 
> On Sat, Mar 16, 2019 at 12:06 AM Arun Ganesh  <mailto:arungra...@gmail.com>> wrote:
> 
> With the upcoming elections, this would be a hot dataset that
> everyone will be looking for. The best available dataset on the
> web right now is on the datameet repository
> 
> <https://github.com/datameet/maps/tree/master/parliamentary-constituencies>
> updated during the previous elections in 2014.
> 
> Does anyone know if there have been changes in the constituency
> boundaries since 2014? Also the existing boundaries are fairly
> generalized resulting in an accuracy of around a km.
> 
> See this comparison for Bengaluru: 1) PC shapes from datameet 2)
> AC shapes from datameet 3) PC shapes from Karnataka KSRAC
> new1.gif
> 
> The KSRAC boundaries was queried from their geoserver
> 
> <https://stg1.ksrsac.in/maps/rest/services/Polling/Polling_PC/MapServer/0/query?where=OBJECTID+is+not+null=esriGeometryEnvelope==esriSpatialRelIntersects===true=false===4326=false=falsefalse=false==false===geojson>
>  and
> are super accurate upto the street level, but is limited to only
> Karnataka. Does anyone know how we can source this for the
> entire country?
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India.
> Know more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Any project on Name Commonality

2018-03-06 Thread Raphael Susewind
Dear Pradeep,

it is possible in principle, though with complications (including
ethical complications). Have a look at my github for starters on how to
extract names from the electoral rolls:

https://github.com/raphael-susewind

What is definitely possible is something like this:

https://www.raphael-susewind.de/blog/2012/noor-mohd-ali

Best,
Raphael

On 03/05/2018 05:29 PM, Pradeep Bhatt wrote:
> Hi All,
> 
> Is there any work done on name commonality in India, something like this
> site
> 
> http://howmanyofme.com/
> 
> Finding how many "Yuvraj Singh" or "Priyanka Chopra" are there in India.
> 
> Guys, who have scraped Voter ID data. Do you think its possible ?
> 
> Regards,
> Pradeep
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Visualization

2017-12-08 Thread Raphael Susewind
Dear Thej,

there is not much of a howto - I took my polling booth locality
shapefiles (https://pub.uni-bielefeld.de/data/2674065) as well as the
MODIS data that is available in convenient format from Naturalearth
(https://pub.uni-bielefeld.de/data/2674065), put both into QGIS, and
used their 'Join attributes by location' function. That's that. A
somewhat more involved processing chain underpins the polling booth
shapefile as such - all described in the link above...

Srini could probably chip in with more detail on how he ran the analysis
per se?

Hope that helps,
Raphael

On 12/08/2017 07:30 AM, Thejesh GN wrote:
> 
> 
> http://www.thehindu.com/elections/gujarat-2017/voting-trends-show-a-clear-rural-urban-divide-for-cong-bjp-in-gujarat/article21285328.ece
> 
> 
> 
> Interesting considering both Susewind and Ramani are part of DataMeet
> community. It would be great to have an how-to. Either audio or text. 
> 
> 
> Quote from from the article:
> 
> Social anthropologist Raphael Susewind’s work on Gujarat was used to
> arrive at this. Dr. Susewind merges NASA’s urban-rural classifications
> (MODIS data) based on satellite information and the Election 
> Commission’s 
> polling
> booth data to identify if a booth is located in a rural or an urban
> setting. MODIS data classifies urban areas into highly urban,
> semi-urban, etc. in a scale of 1 to 9 (the lower number corresponds to
> higher urbanity). Sixty five per cent of the electorate voted in booths
> in rural areas while the rest in various urban classifications.
> 
> Thej
> --
> Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
> http://thejeshgn.com
> GPG ID :  0xBFFC8DD3C06DD6B0
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Need some Guidence on Parsing Electoral Roles.

2017-08-21 Thread Raphael Susewind
Hi Nikhil and Devdatta,

very useful references.

Just to jump in on table conversion: there is a python script called
pdf-table-convert that is quite capable of detecting tables in PDFs.
They use a graphical approach rather than a logical one, so it doesn't
matter how bad the PDF is - even scans work in principle.

Importantly, with the right options, the script gives you boundary box
coordinates for each cell, which you can feed into ghostscript (or
whatever you like) to extract an image of just that cell prior to OCRing
- which indeed saves a lot of time.

The whole processing chain is referenced in my GitHub scripts (see
below), namely in most versions of pdf2list.pl, where pdf-table-convert
is called towards the bottom and the output then fed to tesseract...

Best,
Raphael

On 08/20/2017 11:00 AM, Nikhil VJ wrote:
> Hi Devdatta,
> 
> I had come across the legacy Devnagri fonts issue earlier when I started
> working on budget data. The fonts are Shree-Dev, Kruti-Dev, Shivaji, etc
> : legacy fonts used in an era when unicode devnagri wasn't invented, and
> to get around, there was simple substitution like a = क etc. I've put up
> a graphic that shows this mapping for a few fonts
> : http://i.imgur.com/ICUC6Wk.png
> 
> I found a group named technical-hindi who have been working on simple
> javascript pages that convert these fonts to unicode devnagri (and
> back!). I used them, and with the content I had, I had to introduce some
> extra conversions, and it worked like a charm.
> 
> Their site where many converters are shared :
> https://sites.google.com/site/technicalhindi/home/converters
> Their google group: https://groups.google.com/forum/#!forum/technical-hindi
> 
> I've shared the modified converters I used here:
> http://ourpuneourbudget.in/tools/
> (only had those limited use cases)
> 
> In the process of studying these, I came upon an unexpected situation :
> If the document you are extracting data from is a PDF (which I also
> refer to as "digital graveyard"), then it is PREFERABLE if the fonts are
> in legacy Devnagri font rather than Unicode font! 
> 
> That's because as of today (or 2015 when I came across it), PDF
> technology doesn't handle unicode Devnagri well. Some distortions are
> done to make the glyphs "print" properly, which permanently distorts the
> original chars. The issue is described here:
> https://stackoverflow.com/questions/30756193/unable-to-copy-exact-hindi-content-from-pdf
> 
> ..So if the text in the PDFs you're working on is in legacy Devnagri
> instead of Unicode Devnagri, then you're actually lucky :P . 
> 
> If it's in unicode then that PDF is a true digital graveyard :P. OCR can
> work, yes, but please tell me if you find a way to OCR a page table cell
> by table cell separately instead of jumbling up everything. I had also
> come across a project like yours an year ago but I backed out because I
> could not get around this issue.. the fonts in the PDF were in Unicode.
> 
> Here's an issue I filed in the Tabula project related to this, and they
> fixed it for the legacy fonts extraction at least.
> https://github.com/tabulapdf/tabula/issues/303
> 
> 
> 
> --
> Cheers,
> Nikhil VJ
> +91-966-583-1250
> Pune / Mandangad, India
> DataMeet Pune chapter <https://datameet-pune.github.io/>
> Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
> Blog <http://nikhilsheth.blogspot.in>
> 
> On Sat, Aug 19, 2017 at 11:21 PM, Raphael Susewind
> <li...@raphael-susewind.de <mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Devdatta,
> 
> I had run into the same issue, and indeed the only workaround is OCR.
>     Its not just a different encoding than unicode - its actually garbled
> CMaps, which is much worse (ie not recoverable).
> 
> See my comments here for starters (and the badly written scripts):
> 
> 
> https://github.com/raphael-susewind/india-religion-politics/tree/master/maharolls2014
> 
> <https://github.com/raphael-susewind/india-religion-politics/tree/master/maharolls2014>
> 
> As for Soundex, you might want to take a look at the IndicSoundex
> collection, which is more accurate than transliteration into latin
> followed by English soundex:
> 
> http://libindic.org/Soundex
> 
> Good news is that I have done the whole exercise for Maharashtra 2014,
> and may be able to share depending on what your project is about.
> Perhaps send me a PM and we can discuss further,
> 
> Best,
> Raphael
> 
> On 08/19/2017 06:14 PM, Devdatta Tengshe wrote:
> > I'm attempting to read Names, Ages & Genders from Electoral Rolls, so
> > that I can create a database o

Re: [datameet] Need some Guidence on Parsing Electoral Roles.

2017-08-19 Thread Raphael Susewind
Hi Devdatta,

I had run into the same issue, and indeed the only workaround is OCR.
Its not just a different encoding than unicode - its actually garbled
CMaps, which is much worse (ie not recoverable).

See my comments here for starters (and the badly written scripts):

https://github.com/raphael-susewind/india-religion-politics/tree/master/maharolls2014

As for Soundex, you might want to take a look at the IndicSoundex
collection, which is more accurate than transliteration into latin
followed by English soundex:

http://libindic.org/Soundex

Good news is that I have done the whole exercise for Maharashtra 2014,
and may be able to share depending on what your project is about.
Perhaps send me a PM and we can discuss further,

Best,
Raphael

On 08/19/2017 06:14 PM, Devdatta Tengshe wrote:
> I'm attempting to read Names, Ages & Genders from Electoral Rolls, so
> that I can create a database of Names, to figure out the General Spread
> of Specific Names across locations, and ages.
> 
> I began working with Mumbai's rolls, and am running into the following
> issues:
> 
> 1) The Electoral Rolls are not in English, but in Devanagari. This is
> not a Major issue, because I could transliterate it into English for
> Comparison (I need the names to be in English, so that I can use Soundex
> to remove misspellings etc). I know libraries for transliteratation that
> work with Devanagari (Hindi & Marathi). Is there anything similar for
> other scripts such as Kannada & Tamil etc?
> 
> 2)While the Rolls are in Devanagari, the text is not actually in
> Unicode. It is in some other font, and hence when I Get the text out,
> it's garbage. Since Others have worked with the rolls before, is there a
> better way to get the Text Out?
> 
> 3)If it's not possible to get the Text out, Can we use OCR? What OCR
> library is best at working with Indic Scripts?
> 
> If anyone has some experience to share on these issues, it will be much
> appreciated.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: State Election data

2017-03-14 Thread Raphael Susewind
Hi Srinivas & Shantanu,

for 2007, 2009 (GE), 2012 and 2014 (GE), I already have the Form 20 for
UP scraped here:

https://github.com/raphael-susewind/india-religion-politics/tree/master/upvidhansabha2007

https://github.com/raphael-susewind/india-religion-politics/tree/master/uploksabha2009

https://github.com/raphael-susewind/india-religion-politics/tree/master/upvidhansabha2012

https://github.com/raphael-susewind/india-religion-politics/tree/master/uploksabha2014

The form 20 for 2017 will be added as soon as it is available. Be
careful as the booth ID codes change between elections (and 2007 has
been pre-delimitation anyway).

Cheers,
Raphael

On 03/14/2017 08:40 AM, shantanu choudhary wrote:
> 
> 
> On Tue, Mar 14, 2017 at 1:51 PM, Bhanu Kamapantula <talk2k...@gmail.com
> <mailto:talk2k...@gmail.com>> wrote:
> 
> Hi Srinivas,
> 
> Electors per constituency data has to be scraped out of the state
> PDFs at ECI website. Not sure if anyone's done it already.
> 
> 
> For UP, there  is Polling Booth wise results available
> here: http://ceouttarpradesh.nic.in/Form20.aspx It has data for 2012
> assembly elections and it gives excel sheet for each booth, and
> processing it might be easier than parsing PDF. I am not sure if same is
> available for other states(I didn't find any such listing for Punjab
> atleast). If you are using Linux I was able to get all excel sheets from
> 403 booths using this shell script/command:
> for i in {1..403}; do wget
> http://ceouttarpradesh.nic.in/Form20_12/$i.xls; done
> 
> -- 
> Regards
> Shantanu
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Polling Station Locations

2017-03-13 Thread Raphael Susewind
Hi Joseph,

have a look here: (using 2014 booth IDs)

https://dx.doi.org/10.4119/unibi/2674065

Best,
Raphael

On 03/12/2017 09:59 AM, Joseph Sebastian wrote:
> Hi, 
> 
> The election commission has the poling station location data here
> http://psleci.nic.in/
> 
> Has anyone extracted this? 
> 
> I am looking specifically looking for polling station data for Kerala. 
> 
> Regards,
> 
> Joseph Sebastian 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Electoral roll frontpage details for Andhra, Delhi, Haryana, Karnataka, Kerala, Maharashtra, MP, Orissa, Rajasthan and UP

2017-03-07 Thread Raphael Susewind
Hi Sutirtha,

happy to hear - and yes, of course: if you or somebody else here takes
some time to merge in Census 2011 IDs into my dataset - perhaps starting
with one state for the time being, such as UP - that would be very
helpful to many people.

I expect there will be quite a bit of fuzzy matching involved, though -
which is one reason why it would be good if somebody were to test
whether the workflow you suggest works out, so that we get an idea of
how much effort would be involved to do this at scale...

Feel free to submit pull requests to github and I'll include it...

Best,
Raphael


On 03/07/2017 12:32 PM, Sutirtha Roy wrote:
> Hi Raphael -- The village names in vernacular is available. Towns I am
> not sure about.
> 
> 1. Goto http://164.100.129.6/netnrega/secc_list.aspx
> 2. Choose local language radio button
> 3. Choose the state>district>tehsil>gram_panchayat
> 4. It will give you the vernaculat<>latin map
> 5. Match the latin with Census 2011 ID from lgdirectory
> 
> 
> Let me know if you need help, your poll booth data compilation has been
> immensely useful to me -- and I would be very happy to your efforts.
> 
> 
> Best,
> SR
> 
> On Tue, Mar 7, 2017 at 5:03 PM, Raphael Susewind
> <li...@raphael-susewind.de <mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> I just updated my GitHub repo on elecoral data with the various details
> on the electoral roll frontpages of abovementioned states (Gujarat to be
> added later this week). This is basically (in vernacular script) the
> district, taluk, village, ward, address, name, pincode, etc of each
> booth. The data are in the respective *id tables of my dataset, so for
> Karnataka it would be in the karid table. Hope this is useful:
> 
> https://github.com/raphael-susewind/india-religion-politics
> <https://github.com/raphael-susewind/india-religion-politics>
> 
> This is based on the 2014 rolls, used during the last general election.
> 
> Feel free to experiment with it and please alert me to any mistakes -
> its a semi-automated process, so mistakes can always happen.
> 
> Incidently, does anybody know of a source that links Census ID codes to
> village names in vernacular script? The lgov directory only maps them to
> latin script. Ultimately, the goal is of course to link electoral and
> census data together at finer levels - something many here have been
> interested in over the years...
> 
> Best,
> Raphael
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Lecturer in Social Anthropology & Development
| Department of International Development
| King's College London, London WC2R 2LS, UK
| https://www.raphael-susewind.de

Please consider PGP for encryption: https://keybase.io/raphaelsusewind


-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Electoral roll frontpage details for Andhra, Delhi, Haryana, Karnataka, Kerala, Maharashtra, MP, Orissa, Rajasthan and UP

2017-03-07 Thread Raphael Susewind
Dear all,

I just updated my GitHub repo on elecoral data with the various details
on the electoral roll frontpages of abovementioned states (Gujarat to be
added later this week). This is basically (in vernacular script) the
district, taluk, village, ward, address, name, pincode, etc of each
booth. The data are in the respective *id tables of my dataset, so for
Karnataka it would be in the karid table. Hope this is useful:

https://github.com/raphael-susewind/india-religion-politics

This is based on the 2014 rolls, used during the last general election.

Feel free to experiment with it and please alert me to any mistakes -
its a semi-automated process, so mistakes can always happen.

Incidently, does anybody know of a source that links Census ID codes to
village names in vernacular script? The lgov directory only maps them to
latin script. Ultimately, the goal is of course to link electoral and
census data together at finer levels - something many here have been
interested in over the years...

Best,
Raphael

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Pincode Boundaries of India

2017-03-07 Thread Raphael Susewind
Hi Palash,

no need to - have just pushed it all to GitHub (see separate
announcement)...

Best,
Raphael

On 03/07/2017 11:22 AM, Palash Kulshrestha wrote:
> Hi Veena
> I may be able to help if you can clearly define the steps (not able to 
> understand the kannada language). 
> As far as i can see, pincode in the pdf is 6 digit number which can be easily 
> grepped from pdf.The question is where is the polling booth name.
> 

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] data request

2017-03-06 Thread Raphael Susewind
Dear Roshan,

I have them booth-wise, which should be easy to aggregate to AC level:

https://github.com/raphael-susewind/india-religion-politics/tree/master/upvidhansabha2012

https://github.com/raphael-susewind/india-religion-politics/tree/master/uploksabha2014

Otherwise have a look at the CEO Uttar Pradesh website...

Best,
Raphael

On 03/06/2017 12:45 PM, roshan kishore wrote:
> Would anyone have party-wise (BJP, BSP, Congress, SP) votes for each
> assembly segment for UP in 2012 and 2014 elections. 
> 
> Best
> Roshan 
> 
> Data Journalist 
> Mint 
> 
> official id: rosha...@htlive.com <mailto:rosha...@htlive.com> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Lecturer in Social Anthropology & Development
| Department of International Development
| King's College London, London WC2R 2LS, UK
| https://www.raphael-susewind.de

Please consider PGP for encryption: https://keybase.io/raphaelsusewind


-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Oriya and Malayalam readers...

2017-03-06 Thread Raphael Susewind
Hi Nishadh (?) and George (?),

thanks very much - this is exactly what I hoped for. The complete
dataset with frontpage details will be on my GitHub sometime next week;
will let the group know when and where,

Best,
Raphael

On 03/04/2017 03:39 AM, nishadh wrote:
> For Malayalam, the translated words are mostly in right and always the
> above of Malayalam words
> 
> On Saturday, March 4, 2017 at 8:28:13 AM UTC+5:30, Naveen Francis wrote:
> 
> Hi Raphael,
> 
> Electoral roll have very old data, even which is published in 2017 
> http://ceo.kerala.gov.in/electoralrolls.html
> <http://ceo.kerala.gov.in/electoralrolls.html>
> 
> Taluk/Local Govt mapping have been changed much. 
> Data ECI publishing is based on 2005 Local Govt delimitation. 
> Taluk has increased from 63 to 75. 
> 
> State election commission had updated data. Now their list is not
> seeing. 
> sec.kerala.gov.in <http://sec.kerala.gov.in>
> 
> Thanks,
> Naveen
> 
> 
> On Friday, 3 March 2017 19:49:12 UTC+5:30, Raphael Susewind wrote:
> 
> Dear all,
> 
> I am currently extracting front page information from electoral
> rolls -
> village, taluk, district, station name, station address,
> pincodes etc.
> Some people here are also interested in this as far as I
> remember...
> 
> Since I don't read Malayalam and Oriya, could somebody here help
> me out
> and translate the Oriya bits from the attached image? And
> someone else
> tell me where on Kerala's rolls these variables (station name,
> address,
> village, taluk, etc) can be found (Kerala uses a somewhat different
> frontpage layout - see second attachment)?
> 
> Much appreciated,
> and expect the results on GitHub soon,
> 
> Best,
> Raphael
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Oriya and Malayalam readers...

2017-03-06 Thread Raphael Susewind
Hi Naveen,

thanks for alerting me to the changes - I am interested in whatever was
the case during 2014 elections, but will make a note on the published
dataset to alert others that things change constantly...

Best,
Raphael

On 03/04/2017 02:58 AM, Naveen Francis wrote:
> Hi Raphael,
> 
> Electoral roll have very old data, even which is published in 2017 
> http://ceo.kerala.gov.in/electoralrolls.html
> 
> Taluk/Local Govt mapping have been changed much. 
> Data ECI publishing is based on 2005 Local Govt delimitation. 
> Taluk has increased from 63 to 75. 
> 
> State election commission had updated data. Now their list is not seeing. 
> sec.kerala.gov.in
> 
> Thanks,
> Naveen
> 
> 
> On Friday, 3 March 2017 19:49:12 UTC+5:30, Raphael Susewind wrote:
> 
> Dear all,
> 
> I am currently extracting front page information from electoral rolls -
> village, taluk, district, station name, station address, pincodes etc.
> Some people here are also interested in this as far as I remember...
> 
> Since I don't read Malayalam and Oriya, could somebody here help me out
> and translate the Oriya bits from the attached image? And someone else
> tell me where on Kerala's rolls these variables (station name, address,
> village, taluk, etc) can be found (Kerala uses a somewhat different
> frontpage layout - see second attachment)?
> 
> Much appreciated,
> and expect the results on GitHub soon,
> 
> Best,
> Raphael
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Library to read tables in scanned PDFs

2017-01-23 Thread Raphael Susewind
Hi Mohit,

just to add - a hacked-but-working workflow to extract the table
structure and OCR bits and pieces as needed can be found in my GitHub,
for instance here (at the bottom of the perl file):

https://github.com/raphael-susewind/india-religion-politics/blob/master/rajrolls2014/run-in-arc/pdf2list.pl

It boils down to

pdf-table-extract -i $file -p $page -r 300 -l 0.7 -t cells_xml

for each page, parsing the results to extract cell coordinates, then

gs -q -r300 -dFirstPage=$page -dLastPage=$page -sDEVICE=tiffgray
-sCompression=lzw -o $temp.tif -g".$width."x".$height." -c '<> setpagedevice' -f $file

to get a TIFF of this cell, to be fed into

tesseract -psm 4 -l hin temp.tif stdout

(in the case of devanagari)

Best of luck,
Raphael

On 01/23/2017 09:20 AM, Amanbir Singh wrote:
> Hi Mohit,
> 
> You'll have to use OCR on the pdf before any other method can be
> applied. This obviously makes it more complicated, but still manageable. 
> 
> You could use the Tesseract, a popular OCR package
> (https://github.com/tesseract-ocr/tesseract) and then try using tabula
> or the other packages mentioned. I've also had success using Xpdf
> (http://www.foolabs.com/xpdf/) to convert pdfs to text and then parsing
> the text. 
> 
> Aman
> 
> 
> On Friday, 20 January 2017 18:18:59 UTC+5:30, mohit ranjan wrote:
> 
> Tried Tabula, but again it's for PDF which has all the meta-data
> within it.
> I need it for paper scanned PDF/JPG and it fails by saying so
> 
> /"Sorry, your PDF file is image-based; it does not have any embedded
> text. It might have been scanned from paper... Tabula isn't able to
> extract any data from image-based PDFs. Click the Help button for
> more information."/
> 
> - Mohit
> 
> On Fri, Jan 20, 2017 at 6:14 PM, Srinivasan Ramani
> <sriniv...@gmail.com > wrote:
> 
> Tabula - http://tabula.technology/ works great with table
> extraction from PDFs. 
> 
> On Fri, Jan 20, 2017 at 5:51 PM, mohit ranjan
> <shoony...@gmail.com > wrote:
> 
> Thanks for response Johnson.
> 
> Is this the pdf-table-extract
> <https://github.com/ashima/pdf-table-extract> you are
> referring to ?
> It says, it reads table meta from PDF. 
> 
> My query was for scanned PDF/JPG images
> 
> - Mohit
> 
> On Fri, Jan 20, 2017 at 4:37 PM, Johnson Chetty
> <johnso...@gmail.com > wrote:
> 
> 
> Hello, 
> 
> I have had some reasonable success with 'pdfquery'
> if you like Python. It works with regional text as
> well. 
> Also, for tabular data, do try pdf-table-extract if
> quick and dirty works for you. 
> 
> Java folks should try pdfbox. 
> 
> 
> 
> 
> 
> On 20 January 2017 at 15:23, mohit ranjan
> <shoony...@gmail.com > wrote:
> 
> Sorry if this is off-topic, but have seen
> threads here about liberating data from PDFs.
> Most likely there will be lot of scanned PDFs
> among them.
> 
> Do we have any in-house expert on this and which
> library/tool (preferably not paid) to extract
> tables in scanned PDF/JPG ?
> 
> CVision
> 
> <http://www.cvisiontech.com/library/ocr/file-ocr/ocr-table-recognition.html>
> does a decent job, but it's paid.
> 
> 
> 
> - Mohit
> 
> -- 
> Datameet is a community of Data Science
> enthusiasts in India. Know more about us by
> visiting http://datameet.org
> ---
> You received this message because you are
> subscribed to the Google Groups "datameet" group.
> To unsubscribe from this group and stop
> receiving emails from it, send an email to
> datameet+u...@googlegroups.com .
> For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
> 
> 
> -- 
> Datameet is a community of Dat

Re: Private message regarding: [datameet] Re: Village to AC mapping

2016-07-13 Thread Raphael Susewind
Dear Shafeeq,

thanks for noting this. Could you post your findings to the GitHub
README? I think it is important that others who might want to use that
data know about its limitations (I dont use it myself, merely posted in
on popular request).

The thing is: this is done through automated spatial matching, so there
is really nothing I can do about accuracy - I have a point cloud of
villages and AC polygon shapefiles, and merely let QGIS merge the two. I
cannot vouch for the accuracy of either the point cloud or the polygons
- they are proprietary, not my own creation...

Regards,
Raphael

On 13.07.2016 11:57, Shafeeq Rahman wrote:
> Dear Raphael
> 
> Thanks for sharing such useful information.
> 
> I just cross checked this list for Uttar Pradesh and found error in pre
> and post delimitation i.e. in Ghaziabad district pre delimitation
> showing its villages (like 909000201085000) in Pillibhit AC which is far
> from Ghaziabad.
> 
> Kindly check the same and if require I may cross check the same for
> other states also.
> 
> Regards,
> 
> Shafeeq 
> 
> 
> 
> 
> 
> On Tuesday, 12 July 2016 18:18:39 UTC+5:30, Raphael Susewind wrote:
> 
> Dear Naveen,
> 
> yes, sure - I just pushed these to GitHub, pull request here:
> 
> https://github.com/datameet/india-election-data/pull/17
> <https://github.com/datameet/india-election-data/pull/17>
> 
> Best,
> Raphael
> 
> On 11.07.2016 08:21, Naveen Bharathi wrote:
> >
> > For my research I wanted the list of Assembly constituencies and
> their
> > corresponding villages (from census directories) pre-delimitation and
> > post-delimitation for Karnataka.
> > By any chance would you have the list for pre-limitation assembly
> > segments and their corresponding census villages in the same way?
> >
> > On Monday, April 4, 2016 at 11:58:26 AM UTC+5:30, Raphael Susewind
> wrote:
> >
> > Dear all,
> >
> > some time ago, we had a discussion of linking Census villages to
> > assembly constituencies; there is also a dataset in the datameet
> > catalog: https://github.com/datameet/catalog
> <https://github.com/datameet/catalog>
> > <https://github.com/datameet/catalog
> <https://github.com/datameet/catalog>>
> >
> > Since this is not complete, though, and lacks Census ID codes,
> I have
> > generated a new table (through spatial matching); pull request
> here:
> >
> > https://github.com/datameet/india-election-data/pull/16
> <https://github.com/datameet/india-election-data/pull/16>
> > <https://github.com/datameet/india-election-data/pull/16
> <https://github.com/datameet/india-election-data/pull/16>>
> >
> > Hope this is useful,
> > Raphael
> >
> > --
> > Dr Raphael Susewind | Associate, Contemporary South Asia Studies,
> > Oxford
> >  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld,
> Germany
> >   Web & Twitter | https://www.raphael-susewind.de
> <https://www.raphael-susewind.de>
> > <https://www.raphael-susewind.de
> <https://www.raphael-susewind.de>> | @RaphaelSusewind
> >  Impact | https://impactstory.org/raphael-susewind
> <https://impactstory.org/raphael-susewind>
> > <https://impactstory.org/raphael-susewind
> <https://impactstory.org/raphael-susewind>>
> >
> > Please consider https://www.gnupg.org for encryption (key id
> 10AEE42F)
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India. Know
> more
> > about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it,
>     send
> > an email to datameet+u...@googlegroups.com 
> > <mailto:datameet+u...@googlegroups.com >.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
> -- 
> Raphael Susewind | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>  | https://www.raphael-susewind.de
> <https://www.raphael-susewind.de>
> 
> Please consider https://www.gnupg.org for encryption (key id 10AEE42F)
> 

-- 
Raphael Susewind | Melanchthonstr. 4a, 33615 Bielefeld, Germany
 | https://www.raphael-susewind.de

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Village to AC mapping

2016-07-12 Thread Raphael Susewind
Dear Naveen,

yes, sure - I just pushed these to GitHub, pull request here:

https://github.com/datameet/india-election-data/pull/17

Best,
Raphael

On 11.07.2016 08:21, Naveen Bharathi wrote:
> 
> For my research I wanted the list of Assembly constituencies and their
> corresponding villages (from census directories) pre-delimitation and
> post-delimitation for Karnataka. 
> By any chance would you have the list for pre-limitation assembly
> segments and their corresponding census villages in the same way? 
> 
> On Monday, April 4, 2016 at 11:58:26 AM UTC+5:30, Raphael Susewind wrote:
> 
> Dear all,
> 
> some time ago, we had a discussion of linking Census villages to
> assembly constituencies; there is also a dataset in the datameet
> catalog: https://github.com/datameet/catalog
> <https://github.com/datameet/catalog>
> 
> Since this is not complete, though, and lacks Census ID codes, I have
> generated a new table (through spatial matching); pull request here:
> 
> https://github.com/datameet/india-election-data/pull/16
> <https://github.com/datameet/india-election-data/pull/16>
> 
> Hope this is useful,
> Raphael
> 
> -- 
> Dr Raphael Susewind | Associate, Contemporary South Asia Studies,
> Oxford
>  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>   Web & Twitter | https://www.raphael-susewind.de
> <https://www.raphael-susewind.de> | @RaphaelSusewind
>  Impact | https://impactstory.org/raphael-susewind
> <https://impactstory.org/raphael-susewind>
> 
> Please consider https://www.gnupg.org for encryption (key id 10AEE42F)
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | Melanchthonstr. 4a, 33615 Bielefeld, Germany
 | https://www.raphael-susewind.de

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: OpenPGP digital signature


Re: [datameet] Help with schools location data extracting

2016-05-21 Thread Raphael Susewind
Hi Nikhil,

most likely the flash application loads something like a JSON (or CSV,
if they are bad programmers ;-) ) from a specified API address. Use a
network sniffer to intercept the traffic that the flashplayer generates,
and see whether you can replicate the API.

If you are lucky, you will see HTTP requests to an URL along the lines
of http://schoolgis.nic.in/state_x/data.json?school=001 to
14986. In that case, you can then manually scrape the JSON files (if
need be by emulating a flashplayer's HTTP headers, though I doubt that
they check for this).

If you are unlucky, its a more complex API - some stateful frontends for
SQL databases can be very nasty to replicate, for instance. One brute
force kind of solution in such cases would be to write a custom proxy
server (there are python/perl/... modules for this) - i.e. a kind of
customized sniffer - and route your browser traffic through this, then
automate the browser (again, there are plugins for firefox and chrome
that have corresponding python or perl interfaces), and intercept the
traffic generated. That's the solution I found to scrape polling station
localities from the ECI server (before they put a bold copyright
disclaimer on it - now this kind of scraping would probably be illegal -
so do check these issues as well).

Let us know what you find out about the API,

Best of luck,
Raphael

On 21.05.2016 11:43, Nikhil VJ wrote:

> Hi friends,
> 
> is their any way to extract data from such a flash player platform...as
> follows...
> 
> schoolgis.nic.in <http://schoolgis.nic.in/>
> 
> --regards,
> Nikhil VJ
> Pune
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Census EB coordinates (esp for Mumbai)?

2016-05-19 Thread Raphael Susewind
Dear all,

we had this discussion some time ago, and I fear the situation hasn't
changed - but I wonder whether anyone here can share lat/long
co-ordinates or maps of Census Enumeration Blocks, ie the smallest level
of Census operations?

I am particularly interested in Mumbai...

Best,
Raphael

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Village to AC mapping

2016-04-04 Thread Raphael Susewind
Dear all,

some time ago, we had a discussion of linking Census villages to
assembly constituencies; there is also a dataset in the datameet
catalog: https://github.com/datameet/catalog

Since this is not complete, though, and lacks Census ID codes, I have
generated a new table (through spatial matching); pull request here:

https://github.com/datameet/india-election-data/pull/16

Hope this is useful,
Raphael

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Pincode Boundaries of India

2016-04-02 Thread Raphael Susewind
Hi Dev,

there are state/state.boothraw.* shapefiles, these should contain the
raw polling booth locations.

Heatmap scripts are terribly customized - I would have to look into this
myself, I am afraid, which could take some time (very busy)

You would have to go with voronois for now, sorry,

Best,
Raphael

On 02.04.2016 09:02, Devdatta Tengshe wrote:
> Hi Raphael,
> 
> Firstly, thanks a lot for extracting this information.
> 
> I was looking at http://dx.doi.org/10.4119/unibi/2674065, but I could
> find only the Boundaries for the constituencies.
> 
> Can you tell us where we can find the locations of the polling booths
> that you had extracted?
> 
> Secondly, can you also share (if you still have them) the heatmaps code
> that you used to create the constituency boundaries? I think that is
> what will be required to create the pincode boundaries as well.
> 
> Regards,
> Dev
> 
> Regards,
> Devdatta
> 
> On Fri, Apr 1, 2016 at 6:31 PM, Raphael Susewind
> <li...@raphael-susewind.de <mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> following up on my earlier email, I just pushed a list of pincodes for
> all electoral booths across India to GitHub and made a pull request to
> the datameet repository:
> 
> https://github.com/datameet/pincodes/pull/2
> 
> Please note that this can be incomplete, and is based on a rather
> brutish, quick and dirty hack - see comments in rolls2pincode.pl
> <http://rolls2pincode.pl>. But it
> does use the same IDs as those in the 2014 elections, and hence can be
> combined with my GIS shapefiles for polling booths:
> 
> http://dx.doi.org/10.4119/unibi/2674065
> 
> I leave it to others to double-check accuracy and create actual pincode
> maps. I hope this is useful,
> 
> Best,
> Raphael
> 
> On 28.03.2016 07:50, Raphael Susewind wrote:
> 
> > Dear Avinash and all,
> >
> > I will try to make some time this week to scrape the pincodes from
> > electoral rolls for all polling booths in my electoral GIS shapefiles.
> >
> > Since pincode is in latin script, this should not be affected by the
> > much discussed PDF scraping issues with electoral rolls.
> >
> > We could then either go down the voronoi route, or alternatively
> use the
> > heatmap processing chain that I used to generate AC boundaries - this
> > latter would have the advantage of dealing with wrong coordinates
> in the
> > booth point dataset (basically, not all electoral booth
> coordinates are
> > correct; consequently, if we only voronoi, we would have a blip of
> > pincode B within a see of pincode A quite frequently. The heatmap
> stuff
> > takes care of this).
> >
> > Since I am not familiar with postal boundaries: can anyone here
> confirm
> > whether pincode areas are contiguous, and whether each pincode has
> only
> > one area? Or can it be that several non-contiguous areas have the same
> > pincodem intersparsed with other pincodes? (In which case voronoi
> would
> > perhaps be the better solution at last)
> >
> > In any case, I hope to give you the pincode for each polling booth by
> > end of the week or so (based on all-India 2014 electoral rolls),
> >
> > Best,
> > Raphael
> >
> > On 28.03.2016 06:33, Avinash Celestine wrote:
> >
> >> perhaps one way is to avoid using postal data altogether.
> >>
> >> All header pages in electoral rolls(the first page) contain the
> name of
> >> the polling station related to that roll, the PS number, and
> importantly
> >> the pin code.
> >>
> >>  A site like psleci.nic.in <http://psleci.nic.in>
> <http://psleci.nic.in> has geog coordinates
> >> of polling stations (though Raphael had collected the data earlier*).
> >> Matching the two will give a fairly dense scattering of points  - in
> >> fact much more dense than if we used some of the methods earlier
> in this
> >> thread.
> >>
> >> We thus have a way of associating a pin code with a geo
> coordinate. We
> >> can then use the voronoi method.
> >>
> >> Electoral rolls are mostly in pdf which make them difficult to
> scrape.
> >> But from what i have seen, for any given state, the location on the
> >> header page, of the pincode number is more or less co

Re: [datameet] Pincode Boundaries of India

2016-04-01 Thread Raphael Susewind
Dear all,

following up on my earlier email, I just pushed a list of pincodes for
all electoral booths across India to GitHub and made a pull request to
the datameet repository:

https://github.com/datameet/pincodes/pull/2

Please note that this can be incomplete, and is based on a rather
brutish, quick and dirty hack - see comments in rolls2pincode.pl. But it
does use the same IDs as those in the 2014 elections, and hence can be
combined with my GIS shapefiles for polling booths:

http://dx.doi.org/10.4119/unibi/2674065

I leave it to others to double-check accuracy and create actual pincode
maps. I hope this is useful,

Best,
Raphael

On 28.03.2016 07:50, Raphael Susewind wrote:

> Dear Avinash and all,
> 
> I will try to make some time this week to scrape the pincodes from
> electoral rolls for all polling booths in my electoral GIS shapefiles.
> 
> Since pincode is in latin script, this should not be affected by the
> much discussed PDF scraping issues with electoral rolls.
> 
> We could then either go down the voronoi route, or alternatively use the
> heatmap processing chain that I used to generate AC boundaries - this
> latter would have the advantage of dealing with wrong coordinates in the
> booth point dataset (basically, not all electoral booth coordinates are
> correct; consequently, if we only voronoi, we would have a blip of
> pincode B within a see of pincode A quite frequently. The heatmap stuff
> takes care of this).
> 
> Since I am not familiar with postal boundaries: can anyone here confirm
> whether pincode areas are contiguous, and whether each pincode has only
> one area? Or can it be that several non-contiguous areas have the same
> pincodem intersparsed with other pincodes? (In which case voronoi would
> perhaps be the better solution at last)
> 
> In any case, I hope to give you the pincode for each polling booth by
> end of the week or so (based on all-India 2014 electoral rolls),
> 
> Best,
> Raphael
> 
> On 28.03.2016 06:33, Avinash Celestine wrote:
> 
>> perhaps one way is to avoid using postal data altogether.
>>
>> All header pages in electoral rolls(the first page) contain the name of
>> the polling station related to that roll, the PS number, and importantly
>> the pin code.
>>
>>  A site like psleci.nic.in <http://psleci.nic.in> has geog coordinates
>> of polling stations (though Raphael had collected the data earlier*).
>> Matching the two will give a fairly dense scattering of points  - in
>> fact much more dense than if we used some of the methods earlier in this
>> thread.
>>
>> We thus have a way of associating a pin code with a geo coordinate. We
>> can then use the voronoi method.
>>
>> Electoral rolls are mostly in pdf which make them difficult to scrape.
>> But from what i have seen, for any given state, the location on the
>> header page, of the pincode number is more or less constant, making it
>> possible to target just that part of the page with any pdf parser.
>>
>> Electoral rolls have become difficult to download in bulk( a good
>> thing!) but i understand different people on this group have the pdfs
>> for different states. Putting this stuff together should give us
>> comprehensive data on header pages for atleast some states.
>> Alternatively, we can file RTIs for just the header pages of electoral
>> rolls, though i dont know how successful that would be.
>>
>> * Raphael's data is
>> at https://github.com/raphael-susewind/india-election-data
>>
>>
>>
>> On Sun, Mar 27, 2016 at 12:07 PM, srinivas kodali <iota.kod...@gmail.com
>> <mailto:iota.kod...@gmail.com>> wrote:
>>
>> Well, There were postal delivery zones in the past and the postal
>> department even used to make maps of these zones. The Delhi postal
>> delivery zone map
>> 
>> <https://drive.google.com/file/d/0B1RcWLku0ZOWWVBHMldrZWdfZEU/view?usp=sharing>
>>  had
>> boundaries for delhi. I am not sure if other cities had them or how
>> long the postal department was doing this, but it certainly can help
>> with the boundaries for cities.
>>
>> Regards,
>> Srinivas Kodali
>> www.lostprogrammer.com <http://www.lostprogrammer.com>
>> /"Not everyone who wanders is lost, I am probably a bit"/
>>
>> On Tue, Mar 22, 2016 at 9:29 PM, Arun Ganesh <arungra...@gmail.com
>> <mailto:arungra...@gmail.com>> wrote:
>>
>> Shravan, crowdsourcing the boundaries of pincodes is not as
>> trivial as you think. To start with, an area does not fall under
>> a pincode,

Re: [datameet] Pincode Boundaries of India

2016-03-27 Thread Raphael Susewind
Dear Avinash and all,

I will try to make some time this week to scrape the pincodes from
electoral rolls for all polling booths in my electoral GIS shapefiles.

Since pincode is in latin script, this should not be affected by the
much discussed PDF scraping issues with electoral rolls.

We could then either go down the voronoi route, or alternatively use the
heatmap processing chain that I used to generate AC boundaries - this
latter would have the advantage of dealing with wrong coordinates in the
booth point dataset (basically, not all electoral booth coordinates are
correct; consequently, if we only voronoi, we would have a blip of
pincode B within a see of pincode A quite frequently. The heatmap stuff
takes care of this).

Since I am not familiar with postal boundaries: can anyone here confirm
whether pincode areas are contiguous, and whether each pincode has only
one area? Or can it be that several non-contiguous areas have the same
pincodem intersparsed with other pincodes? (In which case voronoi would
perhaps be the better solution at last)

In any case, I hope to give you the pincode for each polling booth by
end of the week or so (based on all-India 2014 electoral rolls),

Best,
Raphael

On 28.03.2016 06:33, Avinash Celestine wrote:

> perhaps one way is to avoid using postal data altogether.
> 
> All header pages in electoral rolls(the first page) contain the name of
> the polling station related to that roll, the PS number, and importantly
> the pin code.
> 
>  A site like psleci.nic.in <http://psleci.nic.in> has geog coordinates
> of polling stations (though Raphael had collected the data earlier*).
> Matching the two will give a fairly dense scattering of points  - in
> fact much more dense than if we used some of the methods earlier in this
> thread.
> 
> We thus have a way of associating a pin code with a geo coordinate. We
> can then use the voronoi method.
> 
> Electoral rolls are mostly in pdf which make them difficult to scrape.
> But from what i have seen, for any given state, the location on the
> header page, of the pincode number is more or less constant, making it
> possible to target just that part of the page with any pdf parser.
> 
> Electoral rolls have become difficult to download in bulk( a good
> thing!) but i understand different people on this group have the pdfs
> for different states. Putting this stuff together should give us
> comprehensive data on header pages for atleast some states.
> Alternatively, we can file RTIs for just the header pages of electoral
> rolls, though i dont know how successful that would be.
> 
> * Raphael's data is
> at https://github.com/raphael-susewind/india-election-data
> 
> 
> 
> On Sun, Mar 27, 2016 at 12:07 PM, srinivas kodali <iota.kod...@gmail.com
> <mailto:iota.kod...@gmail.com>> wrote:
> 
> Well, There were postal delivery zones in the past and the postal
> department even used to make maps of these zones. The Delhi postal
> delivery zone map
> 
> <https://drive.google.com/file/d/0B1RcWLku0ZOWWVBHMldrZWdfZEU/view?usp=sharing>
>  had
> boundaries for delhi. I am not sure if other cities had them or how
> long the postal department was doing this, but it certainly can help
> with the boundaries for cities.
> 
> Regards,
> Srinivas Kodali
> www.lostprogrammer.com <http://www.lostprogrammer.com>
> /"Not everyone who wanders is lost, I am probably a bit"/
> 
> On Tue, Mar 22, 2016 at 9:29 PM, Arun Ganesh <arungra...@gmail.com
> <mailto:arungra...@gmail.com>> wrote:
> 
> Shravan, crowdsourcing the boundaries of pincodes is not as
> trivial as you think. To start with, an area does not fall under
> a pincode, rather a street does based on the post office that
> services it. Read
> this: http://www.georeference.org/doc/zip_codes_are_not_areas.htm
> 
> You may also want to do some background reading of existing
> research that has been done by the group
> here: https://datameet.hackpad.com/M4hPFJVV2Gm?eid=v4YoXN4tTw5
> 
> To sum up, nobody has precise pincode boundaries like how you
> imagine them, not even the postal department. Any existing
> datasets are an estimate at best using some data processing on a
> large volume of address data.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India.
> Know more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email

Re: [datameet] Pincode Boundaries of India

2016-03-19 Thread Raphael Susewind
Hi Shravan,

another option - depending on what you are after - could be to use
Devdatta's point data for post offices, voronoi it into polygons, and
aggregate by pincode - that might not be the same as official
boundaries, but the closest you can get (each locality in India would be
assigned to the most proximate postoffice...)

Best,
Raphael

On 17.03.2016 06:18, Jaisen Nedumpala wrote:
> Hi Shravan,
> 
> I don't think that you would get it that easy. I was in search of this
> data, since the year 2008. Eventually I could understand that even the
> department of posts doesnt have this data. We could do it as a community
> project to build it. Not easy, but not impossible.
> 
> 
> 2016-03-17 10:32 GMT+05:30 shravan <shravan.s...@gmail.com
> <mailto:shravan.s...@gmail.com>>:
> 
> Hey everyone,
> 
> I am looking for pin code boundaries of India, preferably in any of
> the GIS file formats ( kml, kmz, shp, geojson or any other ). It
> would be nice if someone can point me in the right direction, where
> I can get this data from.
> 
> Thanks,
> Shravan
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
>  - ജയ്സെനോവ് നെടുമ്പാലോവിച്ച് പഹയനോവ്സ്കി -
> ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
> (`'·.¸(`'·.¸^¸.·'´)¸.·'´)
> «´¨`·* . Jaisenov. *..´¨`»
> (¸.·'´(`'·.¸ ¸.·'´)`'·.¸)
> ¸.·´^.`'·.¸ ¸.·'´
>  ( `·.¸`·.¸
>   `·.¸ )`·.¸
>  ¸.·(´ `·.¸
> ¸.·(.·´)`·.¸
>   ( `v´ )
> `v´
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Data on religion and politics in India now on GitHub

2016-02-22 Thread Raphael Susewind
Dear all,

over the last weeks, I moved my comprehensive dataset on religion and
politics in India to GitHub. This should make it more easily accessible
and also makes it easier for others - you - to add content.

So far, it includes Uttar Pradesh data, namely booth-level election
results, booth-level estimates of religious demography, candidate names,
GIS data, and a table that (partially) links booth-level data from 2007
through 2009 and 2012 to 2014. Detailed infos on all the tables and
variables, on licenses, etc are online here:

https://github.com/raphael-susewind/india-religion-politics

Do play around with it, give it a GitHub star if you like it, and
improve upon it! In a couple of weeks, I will create a formal first
release, in case no major bugs crop up until then.

This group has been a tremendous resource for me; I am glad to give back
what I can in the spirit of open data sharing and research.

All the best,
Raphael

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Form 20 for Mumbai

2016-01-24 Thread Raphael Susewind
Dear all,

following my earlier email, I have now compiled booth-level results for
the 2014 assembly polls in Mumbai (ACs 152-187) and put them in the
datameet github repo (pull request pending) in case anyone is
interested: https://github.com/datameet/india-election-data/pull/16

Happy coding,
Raphael

On 16.01.2016 18:26, Raphael Susewind wrote:

> Dear all,
> 
> I am reviving this old thread to ask whether anyone has Form 20 election
> results for Maharashtra 2014 Assembly Polls in a usable format (csv,
> json, xml, whatever - but not scanned pdf...)?
> 
> The PDFs are online on the CEO website, but before I go to the trouble
> of extracting data, I wonder whether someone has done it already?
> 
> If not, I shall put them on the datameet github in a few weeks...
> 
> Thanks,
> Raphael

-- 
Dr. Raphael Susewind | Political anthropologist, Associate CSASP Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Form 20 for Mumbai

2016-01-16 Thread Raphael Susewind
Dear all,

I am reviving this old thread to ask whether anyone has Form 20 election
results for Maharashtra 2014 Assembly Polls in a usable format (csv,
json, xml, whatever - but not scanned pdf...)?

The PDFs are online on the CEO website, but before I go to the trouble
of extracting data, I wonder whether someone has done it already?

If not, I shall put them on the datameet github in a few weeks...

Thanks,
Raphael

On 07.11.2014 12:16, Avinash Celestine wrote:
> no unfortunately not. my impression is that maharashtra is slower than
> putting these out than some other states.
> 
> A
> 
> On Nov 7, 2014 2:09 PM, "Raphael Susewind" <li...@raphael-susewind.de
> <mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Avinash,
> 
> Thanks - I did not check all PDFs systematically, should have done...
> 
> Any idea when and/or whether assembly form 20 will be available?
> 
> Best,
> Raphael
> 
> On 07.11.2014 08:52, Avinash Celestine wrote:
> > I should mention that these are for the parliamentary elections of May
> > 2014, not the recent assembly elections.
> >
> > A
> >
> > On Fri, Nov 7, 2014 at 1:20 PM, Avinash Celestine
> > <avinash.celest...@gmail.com <mailto:avinash.celest...@gmail.com>
> <mailto:avinash.celest...@gmail.com
> <mailto:avinash.celest...@gmail.com>>> wrote:
> >
> > Hi Raphael,
> >
> > some of those links are dead, but not all. seems not all form 20s
> > for each constituency have been uploaded yet. I have the ones for
> > which it is (downloaded sometime back) ...attached. as far as
> mumbai
> > is concerned, I think the south mumbai data is not there
> > yet...ignore the pdf files which are too small in size (2KB etc).
> > Those are the ones for which the links were dead.
> >
> >
> >
> >
>     > ​
> >  mhGE2014-incomplete.zip
> >   
>  
> <https://docs.google.com/file/d/0BxAgA1sHG2dMcDVrWFd1Vkozb2M/edit?usp=drive_web>
> > ​
> >
> > On Fri, Nov 7, 2014 at 12:09 PM, Raphael Susewind
> > <li...@raphael-susewind.de <mailto:li...@raphael-susewind.de>
> <mailto:li...@raphael-susewind.de
> <mailto:li...@raphael-susewind.de>>> wrote:
> >
> > Dear all,
> >
> > does anyone have access to booth-level results for
> Maharashtra,
> > especially Mumbai, both general and assembly elections? Or any
> > information as to whether and when it might be available?
> On the CEO
> > website, one finds links to general election form 20, but
> those
> > links
> > are dead. No mention of assembly data (either now or earlier):
> >
> > https://www.ceo.maharashtra.gov.in/Results/Form20.aspx
> >
> > Any hint appreciated,
> > Raphael
> >
> > --
> > Raphael Susewind | BGHS Bielefeld University, CSASP University
> > of Oxford
> >   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld,
> Germany
> >Web & Twitter | http://www.raphael-susewind.de |
> @RaphaelSusewind
> >
> > Please do consider http://www.gnupg.org for encryption (key id
> > 10AEE42F)
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India.
> > Know more about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the
> > Google Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from
> > it, send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>
> > <mailto:datameet%2bunsubscr...@googlegroups.com
> <mailto:datameet%252bunsubscr...@googlegroups.com>>.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India. Know
> more
> > about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
&g

Re: [datameet] Parsing Voters List : Glyph to Unicode issue

2015-09-21 Thread Raphael Susewind
Hi Siddarth and Nikhil,

sorry for the delay, I was travelling for the past weeks. I have worked
extensively with the electoral rolls, and ultimately the only solution I
found for the problem of corrupted text is OCR - tesseract was the most
accurate in my experiments (and the relatively fastest...). It can also
be automated, though scaling up would require vast resources.

Let us know if you find an alternative (though I am sceptical),

Best,
Raphael

On 19.09.2015 11:51, Nikhil VJ wrote:
> Hi Siddharth,
> 
> Sorry I missed this earlier.
> In April this year I converted a budget PDF to excel that had Marathi
> content, in legacy font (similar to ShreeDev). It was two-step : first
> extract to excel, and then replace all the text after passing through a
> legacy font to unicode converter (an HTML file with javascript)
> 
> http://nikhilsheth.blogspot.in/2015/05/diy-pdf-to-excel-spreadsheet-conversion.html
> 
> Just check your document or send me a copy.. if it has legacy fonts then
> copy-pasting from it gives us random english letters and punctuations.
> It it's unicode, then copy-pasting gives us unicode text only, but
> inaccurate. It's possible that someone might have made a converter for
> this; if not, then if you have enough content then you could make your
> own converter.
> 
> If the PDF has Unicode font in it, then my method fails.
> 
> I wasn't aware of the stackoverflow questions you've linked to. Great
> insights here into why Unicode extraction is failing.
> 
> If it's less pages then this free online multi-language OCR tool might
> help: http://www.i2ocr.com/free-online-hindi-ocr
> (per page time-taking process, so only advisable if content is less or
> if you have a slave army of interns at your disposal :P)
> 
> 
> 
> 
> --
> Cheers,
> Nikhil
> +91-966-583-1250
> Pune, India
> Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
> http://nikhilsheth.blogspot.in
> 
> 
> 
> 
> 
> On Tue, Sep 1, 2015 at 7:37 PM, Siddharth Vijayakrishnan
> <svija...@gmail.com <mailto:svija...@gmail.com>> wrote:
> 
> Hi,
> 
> I downloaded a few files containing voter rolls and tried to parse
> the PDFs using pdfminer. Ran straight into a problem[1] where the
> glyphs are converted to unicode using a wrong character map.  Before
> I try and solve this on my own, I wonder if anyone in this community
> has a readymade solution ?
> 
> [1]
> 
> http://stackoverflow.com/questions/31876415/parsing-a-pdfdevanagari-script-using-pdfminer-gives-incorrect-output
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr. Raphael Susewind | Political anthropologist, Associate CSASP Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] How to get polling station/booth lat-long data for an assembly constituency

2015-05-15 Thread Raphael Susewind
Dear all,

the whole dataset is also available at
http://dx.doi.org/10.4119/unibi/2674065 - raw data was scraped at a time
when psleci did not have a copyright disclaimer...

Best
Raphael

On 15.05.2015 20:23, Nikhil VJ wrote:
 Not sure about the legality of the data itself, but sharing a general
 method we can use in any mapping interface which is working here as well
 on my end.
 
 Install Firebug extension in Firefox browser.
 https://addons.mozilla.org/en-US/firefox/addon/firebug/
 
 Go to psleci.nic.in http://psleci.nic.in
 Activate Firebug. Console appears at bottom (or wherever you've
 positioned it)
 Go on Net tab
 Under that, XHR tab
 
 Select State, then District, then AC (Assembly Constituency)
 Then press one of the Search buttons.
 
 All the polling stations in that constituency come up.
 
 Now in the console, click on POST GetGoogleObject to expand it (this
 should be around a 100 KB in size now while one step earlier it was much
 smaller.)
 Go to JSON tab
 Click on d to expand it
 Right-click on Points, and select Copy Points as JSON 
 
 Now go to http://konklone.io/json/
 Paste the JSON stuff there
 It gets converted to a table, and you can see it.
 Download the CSV linked.
 Open up the CSV in Calc/Excel, and edit as per your needs.
 
 -
 Why Firebug is needed : The regular inspect element etc parts of
 Chrome and Firefox do help you to see the incoming JSON objects, but
 Firebug also lets you copy them out.
 
 Screenshot:
 Inline image 1
 
 
 --
 Cheers,
 Nikhil
 +91-966-583-1250
 Pune, India
 Self-designed learner at Swaraj University http://www.swarajuniversity.org
 http://nikhilsheth.blogspot.in
 
 
 -- 
 Datameet is a community of Data Science enthusiasts in India. Know more
 about us by visiting http://datameet.org
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Any privacy issue in publishing names of voters?

2015-03-01 Thread Raphael Susewind
Hi Anand,

as someone who worked with the voter lists, including an analysis of
trends (http://www.raphael-susewind.de/blog/2012/noor-mohd-ali), I would
personally NOT put them online in disaggregate form. I would only share
aggregate data (i.e. the 50 most frequent names in state X and their
prominence over time, or some such). If you do put them online, I would
do so at state level only, not further disaggregated. But I DO think
there are big privacy issues here. There was a discussion on this on the
list a few months back as well - spurred by this post by Snehashish
Ghosh:
http://cis-india.org/internet-governance/blog/electoral-databases-2013-privacy-and-security-concerns

My 5 cents,
Raphael

On 02.03.2015 05:04, Anand Chitipothu wrote:
 Hi,
 
 I've voter data for couple of states with me. I'm thinking of publishing
 gender, age and name of all voters of these. Do you see any privacy
 issue in this? Any other issue that I should be careful about?
 
 I'm planning to sort the names before publishing so that the original
 order is lost.
 
 I think it'll be very interesting to study the patterns of how names are
 changing over time.
 
 Anand
 
 -- 
 Datameet is a community of Data Science enthusiasts in India. Know more
 about us by visiting http://datameet.org
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Form 20 for Mumbai

2014-11-07 Thread Raphael Susewind
Dear all,

Mumbai city itself is online here:

http://www.electionmumbaicity.org/assembly2014boothwiseresults.html

Best,
Raphael

On 07.11.2014 07:39, Raphael Susewind wrote:
 Dear all,
 
 does anyone have access to booth-level results for Maharashtra,
 especially Mumbai, both general and assembly elections? Or any
 information as to whether and when it might be available? On the CEO
 website, one finds links to general election form 20, but those links
 are dead. No mention of assembly data (either now or earlier):
 
 https://www.ceo.maharashtra.gov.in/Results/Form20.aspx
 
 Any hint appreciated,
 Raphael
 

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Form 20 for Mumbai

2014-11-06 Thread Raphael Susewind
Dear all,

does anyone have access to booth-level results for Maharashtra,
especially Mumbai, both general and assembly elections? Or any
information as to whether and when it might be available? On the CEO
website, one finds links to general election form 20, but those links
are dead. No mention of assembly data (either now or earlier):

https://www.ceo.maharashtra.gov.in/Results/Form20.aspx

Any hint appreciated,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Data on Muslim electorate in UP and Gujarat

2014-09-27 Thread Raphael Susewind
Dear all,

I am happy to inform you that today's EPW carries a piece on Spatial
variation in the 'Muslim vote' in Gujarat and Uttar Pradesh, 2014,
which I have co-authored with Raheel Dhattiwala:

http://www.epw.in/ejournal/show/1/_/3024

We demonstrate that Muslims' electoral choices vary a lot from
constituency to constituency, implying that vote banks operate on a
much more local level than hitherto assumed. We also explore a few
factors that might shape this variation: minority concentration, riot
history, and ethnic coordination.

More relevant to this list: we also published interactive maps and a
replication dataset under an open license, which contains booth-wise
estimates of the Muslim electorate. Those of you working on religion and
politics might be interested to play with it:

http://www.raphael-susewind.de/blog/2014/

Let me know if you find that data useful,
and/or if you have any questions about it,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Mumbai/Thane assembly constituency

2014-09-23 Thread Raphael Susewind
Hi Saurabh,

you might have a look at my dataset - it is of varying quality (because
raw data from the ECI was), but perhaps it does what you need:

http://dx.doi.org/10.4119/unibi/2674065

Best,
Raphael

On 23.09.2014 09:09, Saurabh Datar wrote:
 Hi all,
 
 Is there any shapefile/SVG file for assembly constituencies of Mumbai
 and adjoining Thane? Couldn't find it anywhere. I wished some
 visualisations on my own blog. Please help if possible.
 
 thank you 
 
 -- 
 Datameet is a community of Data Science enthusiasts in India. Know more
 about us by visiting http://datameet.org
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Urban constituencies

2014-06-17 Thread Raphael Susewind
Dear Srini,

actually I don't know exactly, don't use the indicator myself - you will
have to read the papers by Schneider et al to figure out. Referenced
here:
http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-urban-area/

Best,
Raphael

On 18.06.2014 04:45, Srinivasan Ramani wrote:
 Dear Raphael, 
 
 Apropos MODIS' urban ranking, can you please clarify the following? A
 higher rank (say 3 over 9) suggests greater urbanisation, right? Or is
 it the other way around? 9 suggests greater urbanisation as compared to 3? 
 
 Thanks,
 Srini
 
 
 On Tue, May 13, 2014 at 3:44 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Hi Ravi,
 
 I did the matching against MODIS data, but don't have electorate count
 at hand, so no percentages of urban electorate population yet. But I am
 sure you can take it further from the CSV here:
 
 http://www.raphael-susewind.de/ruralurban.csv.tgz
 
 This table shows rural/urban as well as urban rank (an indicator MODIS
 uses for how urban is it?) across India on booth level. Be aware that
 not all booths are covered by the matching though (and some states,
 notably Uttarakhand, are terribly inaccurate), so you have to aggregate
 wisely.
 
 For fun, I have also added a list of urban booth count share for
 parliamentary constituencies, which should give you a very rough idea of
 electorate share as well, since all booths are supposed to have a
 similar number of electors in them.
 
 Hope it is useful,
 Raphael
 
 On 13.05.2014 10:53, Ravi Krishnan wrote:
  Hi Raphael,
 
  Thanks for your prompt answer. While there is not any official list, I
  am told that the EC gives the percentage of urban population in each
  constituency. I couldn't find it in their web site though.
 
  As for the method you suggest, I just don't have the technical
 skills to
  pull that off.
 
  Thanks and regards
 
  Ravi
 
 
  On 13 May 2014 14:04, Raphael Susewind li...@raphael-susewind.de
 mailto:li...@raphael-susewind.de
  mailto:li...@raphael-susewind.de
 mailto:li...@raphael-susewind.de wrote:
 
  Hi Ravi,
 
  there are various options here, depending on what you want.
 
  I am not aware of an official list of rural/urban
 constituencies as
  such. But booths are classified as either urban or rural, at
 least on
  the electoral rolls, and probably elsewhere, too. This could
 be used in
  a simple counting game to see where more than a certain
 threshold of the
  electorate votes in urban areas according to the official ECI
  definition.
 
  If you are less interested in the official definition, you
 could try a
  GIS-based alternative and overlay the polling booth point layer
  (http://dx.doi.org/10.4119/unibi/2674065) with the MODIS
 rural/urban
  polygon by NaturalEarth, which is quite accurate in terms of
 habitation
  pattern irrespective of their official designation
 
 
 (http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/).
 
  Best,
  Raphael
 
  On 13.05.2014 10:08, Ravi Krishnan wrote:
 
   Hi,
  
   Does anyone have a list of urban constituencies - defined
 here as
  those
   with over 75% urban population?
  
   Thanks and regards
  
   --
   Ravi Krishnan
  
   Mint
   Tower 3, 9th Floor, India Bulls Finance Centre,
   Senapati Bapat Marg, Elphinstone Road (W),
   Mumbai - 400 013
   Ph:+91-22-6613 4000/4001
   Mob: +91-97691-72938
  
   --
   For more details about this list
   http://datameet.org/discussions/
   ---
   You received this message because you are subscribed to the
 Google
   Groups datameet group.
   To unsubscribe from this group and stop receiving emails
 from it, send
   an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet%2bunsubscr...@googlegroups.com
  mailto:datameet%2bunsubscr...@googlegroups.com
 mailto:datameet%252bunsubscr...@googlegroups.com
   mailto:datameet+unsubscr...@googlegroups.com
 mailto:datameet%2bunsubscr...@googlegroups.com
  mailto:datameet%2bunsubscr...@googlegroups.com
 mailto:datameet%252bunsubscr...@googlegroups.com.
   For more options, visit https://groups.google.com/d/optout.
 
  --
  Raphael Susewind | BGHS Bielefeld University, CSASP University
 of Oxford
Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
 Papers  Blog | http://www.raphael-susewind.de
 
  Please do consider http

Re: [datameet] Form20 results for UP and Gujarat

2014-06-17 Thread Raphael Susewind
Dear all,

Matt Lowe pointed me to crawling errors in the original version - they
are now corrected on the datameet github.

Sorry for that,
Raphael

On 10.06.2014 08:54, Raphael Susewind wrote:
 Dear all,
 
 I just added booth-wise results for UP and Gujarat to the datameet
 github - if anybody is working on other states, please contribute, too:
 
 https://github.com/datameet/india-election-data/pull/10
 
 Best,
 Raphael
 

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Form20 results for UP and Gujarat

2014-06-10 Thread Raphael Susewind
Dear all,

I just added booth-wise results for UP and Gujarat to the datameet
github - if anybody is working on other states, please contribute, too:

https://github.com/datameet/india-election-data/pull/10

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Booth-wise elector count (male/female/other/total)

2014-06-07 Thread Raphael Susewind
Dear all,

now that Form20 results start to come out, some of you might be
interested in booth-wise elector count to be able to calculate
fine-grained turnout rates. They are not contained in Form20, but
available in the electoral rolls; as a side effect of my ongoing
academic work, I have extracted these.

Here is my pull request to the datameet github:

https://github.com/datameet/india-election-data/pull/8

Note that this is based on a quick-hack automated extraction, so no
guarantees. Also, some states and UTs are missing, notably:

Uttarakhand - PDF rolls not available
Chhattisgarh - PDFs rolls behind captcha
Lakshadweep - problem with parsing
Chandigarh - problem with parsing

I hope this is useful to some,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Booth-wise elector count (male/female/other/total)

2014-06-07 Thread Raphael Susewind
UP is not yet out as far as I know.
Gujarat is, and some other states...

Best,
Raphael

On 07.06.2014 09:43, Avinash Celestine wrote:
 great. thanks
 
 Have the form 20s for UP been put out? I know Bihar and Bengal are out...
 
 Avinash
 
 
 On Sat, Jun 7, 2014 at 11:56 AM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Dear all,
 
 now that Form20 results start to come out, some of you might be
 interested in booth-wise elector count to be able to calculate
 fine-grained turnout rates. They are not contained in Form20, but
 available in the electoral rolls; as a side effect of my ongoing
 academic work, I have extracted these.
 
 Here is my pull request to the datameet github:
 
 https://github.com/datameet/india-election-data/pull/8
 
 Note that this is based on a quick-hack automated extraction, so no
 guarantees. Also, some states and UTs are missing, notably:
 
 Uttarakhand - PDF rolls not available
 Chhattisgarh - PDFs rolls behind captcha
 Lakshadweep - problem with parsing
 Chandigarh - problem with parsing
 
 I hope this is useful to some,
 
 Best,
 Raphael
 
 --
 Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind
 
 Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)
 
 --
 Datameet is a community of Data Science enthusiasts in India. Know
 more about us by visiting http://datameet.org
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet%2bunsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 Datameet is a community of Data Science enthusiasts in India. Know more
 about us by visiting http://datameet.org
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web  Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)



signature.asc
Description: OpenPGP digital signature


Re: [datameet] Urban constituencies

2014-05-13 Thread Raphael Susewind
Hi Ravi,

there are various options here, depending on what you want.

I am not aware of an official list of rural/urban constituencies as
such. But booths are classified as either urban or rural, at least on
the electoral rolls, and probably elsewhere, too. This could be used in
a simple counting game to see where more than a certain threshold of the
electorate votes in urban areas according to the official ECI definition.

If you are less interested in the official definition, you could try a
GIS-based alternative and overlay the polling booth point layer
(http://dx.doi.org/10.4119/unibi/2674065) with the MODIS rural/urban
polygon by NaturalEarth, which is quite accurate in terms of habitation
pattern irrespective of their official designation
(http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/).

Best,
Raphael

On 13.05.2014 10:08, Ravi Krishnan wrote:

 Hi,
 
 Does anyone have a list of urban constituencies - defined here as those
 with over 75% urban population?
 
 Thanks and regards
 
 -- 
 Ravi Krishnan
 
 Mint
 Tower 3, 9th Floor, India Bulls Finance Centre,
 Senapati Bapat Marg, Elphinstone Road (W),
 Mumbai - 400 013
 Ph:+91-22-6613 4000/4001
 Mob: +91-97691-72938
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Urban constituencies

2014-05-13 Thread Raphael Susewind
Hi Gilles,

nice to see you over here ;-)

This is not based on Census at all, bot on 2002/3 images of the MODIS
satellite, processed by NASA to classify land cover as habitated or not,
rural or urban (funnily enough, part of the criteria is light at night
- they must have come up with something else for India though, at least
in UP there is no power at 3am ;-)...). As such, it pretty accurately
reflects the rural/urban divide in terms of physical geography a decade
ago - but not necessarily by GoI definition. Its a rough fix until we
get a nice, easily browsable list of the ECI's own booth-wise
rural/urban classification for 2014...

On Tonk and Sawai Madhopur: this is an odd slip-up in the AC-to-PC
conversion in my scripts, thanks for noticing. To correct, have a look
at the raw data in ruralurban.csv.tgz and re-calculate from the AC list.
In the meantime, I shall check what went wrong with the PC classification,

Best,
Raphael

On 13.05.2014 22:52, gilles.verni...@sciencespo.fr wrote:
 Is this 2001 census, by the way? Is it valid to juxtapose to 2014
 constituencies? 
 Thanks!
 
 Gilles 
 
 Le mardi 13 mai 2014 15:44:19 UTC+5:30, Raphael Susewind a écrit :
 
 Hi Ravi,
 
 I did the matching against MODIS data, but don't have electorate count
 at hand, so no percentages of urban electorate population yet. But I am
 sure you can take it further from the CSV here:
 
 http://www.raphael-susewind.de/ruralurban.csv.tgz
 http://www.raphael-susewind.de/ruralurban.csv.tgz
 
 This table shows rural/urban as well as urban rank (an indicator
 MODIS
 uses for how urban is it?) across India on booth level. Be aware that
 not all booths are covered by the matching though (and some states,
 notably Uttarakhand, are terribly inaccurate), so you have to aggregate
 wisely.
 
 For fun, I have also added a list of urban booth count share for
 parliamentary constituencies, which should give you a very rough
 idea of
 electorate share as well, since all booths are supposed to have a
 similar number of electors in them.
 
 Hope it is useful,
 Raphael
 
 On 13.05.2014 10:53, Ravi Krishnan wrote:
  Hi Raphael,
 
  Thanks for your prompt answer. While there is not any official
 list, I
  am told that the EC gives the percentage of urban population in each
  constituency. I couldn't find it in their web site though.
 
  As for the method you suggest, I just don't have the technical
 skills to
  pull that off.
 
  Thanks and regards
 
  Ravi  
 
 
  On 13 May 2014 14:04, Raphael Susewind li...@raphael-susewind.de
 javascript:
  mailto:li...@raphael-susewind.de javascript: wrote:
 
  Hi Ravi,
 
  there are various options here, depending on what you want.
 
  I am not aware of an official list of rural/urban
 constituencies as
  such. But booths are classified as either urban or rural, at
 least on
  the electoral rolls, and probably elsewhere, too. This could
 be used in
  a simple counting game to see where more than a certain
 threshold of the
  electorate votes in urban areas according to the official ECI
  definition.
 
  If you are less interested in the official definition, you
 could try a
  GIS-based alternative and overlay the polling booth point layer
  (http://dx.doi.org/10.4119/unibi/2674065
 http://dx.doi.org/10.4119/unibi/2674065) with the MODIS rural/urban
  polygon by NaturalEarth, which is quite accurate in terms of
 habitation
  pattern irrespective of their official designation
 
 
 (http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/
 
 http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/).
 
 
  Best,
  Raphael
 
  On 13.05.2014 10:08, Ravi Krishnan wrote:
 
   Hi,
  
   Does anyone have a list of urban constituencies - defined
 here as
  those
   with over 75% urban population?
  
   Thanks and regards
  
   --
   Ravi Krishnan
  
   Mint
   Tower 3, 9th Floor, India Bulls Finance Centre,
   Senapati Bapat Marg, Elphinstone Road (W),
   Mumbai - 400 013
   Ph:+91-22-6613 4000/4001
   Mob: +91-97691-72938
  
   --
   For more details about this list
   http://datameet.org/discussions/
 http://datameet.org/discussions/
   ---
   You received this message because you are subscribed to the
 Google
   Groups datameet group.
   To unsubscribe from this group and stop receiving emails
 from it, send
   an email to datameet+u...@googlegroups.com javascript:
  mailto:datameet%2bunsubscr

Re: [datameet] [Article] Limitations of the PDF

2014-05-12 Thread Raphael Susewind
Lets hope the Election Commission reads this before declaring results...

On 12.05.2014 10:46, Sriram Karra wrote:
 
 http://www.thehindu.com/opinion/op-ed/limitations-of-the-pdf/article5998841.ece
 
 == snip ==
 
 
 The basic format doesn’t include any requirement that text be
 selectable or searchable, while data presented as charts and tables
 is often impossible to export in any useable way.
 
 It’s the standard file format for nearly every academic paper, political
 briefing and research note. But a new report by the World Bank suggests
 that the venerable pdf is keeping valuable information buried in
 servers, unread and unloved.
 
 == /snip ==
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Looking for KML file of India's parliamentary constituencies, need it urgently!

2014-04-28 Thread Raphael Susewind
Ravi, alternatively you might want to try mapbox.com tilemill /
tilestream to render the files once, and then serve tiles only...

Best,
Raphael

On 28.04.2014 10:48, Thejesh GN wrote:
 Ravi,
 It generates a KML of 35MB which is huge for the web apps.
 
 Google maps/API has limitation wrt to the size of KML 
 https://developers.google.com/kml/documentation/mapsSupport
 
 --
 But it works on leaftlet using 
 https://gis.stackexchange.com/questions/33513/how-do-i-overlay-a-kml-on-leaflet-0-4-4
 
 But even locally rendering it takes about 5 minutes,
 
 
 So I agree with Srinivasan Ramani, try simplify it. 
 
 Thej
 --
 Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
 http://thejeshgn.com
 GPG ID :  0xBFFC8DD3C06DD6B0
 
 
 On Mon, Apr 28, 2014 at 11:16 AM, Ravi Bajpai bajpair...@gmail.com
 mailto:bajpair...@gmail.com wrote:
 
 Ok, so I converted the .shp file to .kml. Fusion Tables refused to
 parse the file. So then I uploaded the ,kml file on Google Drive.
 When I open it on Google Maps from there, the application says it
 encountered problem with some data, and doesn't show anything on the
 map.
 
 Please help!
 
 Best,
 
 Ravi
 
 On Sunday, April 27, 2014 7:20:28 PM UTC+5:30, Ravi Bajpai wrote:
 
 Hey all,
 
 I work for Hindustan Times here in Delhi. I am trying to prepare
 backend logistics to build an interactive map to be published on
 our website on the election counting day. I need KML file of
 India's parliamentary constituencies, but I can't seem to find
 it anywhere.
 
 Please help.
 
 Thanks a lot.
 
 Best,
 
 Ravi Bajpai
 Multimedia Editor
 Hindustan Times
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)



signature.asc
Description: OpenPGP digital signature


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-04-17 Thread Raphael Susewind
Dear all,

just a follow-up to this oldish thread: I recently switched to the
newest version of TesserAct OCR to transform buggy PDF rolls to text -
and it works surprisingly well. Small typos here and there, but that can
be rectified. In case anyone else looks for a solution to this...

Best,
Raphael

On 13.03.2014 08:03, Raphael Susewind wrote:
 Hey Avinash,
 
 yep - thats what I figured, too. Not only misplaced matras (those could
 be rearranged), but a real garbling, which cannot be resolved as far as
 I see. Worse, there isnt even a clear pattern - for a few
 constituencies, I fed the Voter ID (which is in latin script) to the
 search roll details by voter ID function on the CEO website, which
 returns the properly written unicode name. I then compared garbled name
 and unicode name to see if there are any statistical regularities - yet
 unfortunately, there are a thousand ways of garbling Avinash - its not
 always Abniszhaa.
 
 The only solution I can think of is the following (but I have not
 implemented it): train TesserAct (an IndicScript OCR) with the exact
 font used in the PDF reports, so that it almost perfectly recognizes
 something written in this font (this was a stumblestone for me, rather
 complicated work), then extract images of text areas of interest, and
 run them through OCR. If you want to give it a shot...
 
 Otherwise, we could only try to convince the EC to fix the bug in
 Crystal Reports, and re-generate all PDFs - which is highly unlikely,
 they have more important things to do right now (the PDFs display and
 print alright, after all, just text extraction does not work - they
 would perhaps even consider it a feature rather than a bug).
 
 It might be useful to compile a list of states where this problem occurs
 - I have seen it in Gujarat and UP for sure, but don't know whether it
 happens everywhere,
 
 Best,
 Raphael
 
 On 13.03.2014 05:35, Avinash Celestine wrote:
 well i checked out the unicode table and it only confirms what we knew
 anyway... that there's duplication of unicode hex values for different
 characters... 

 So i guess its back to the drawing board.


 On Thu, Mar 13, 2014 at 9:43 AM, Avinash Celestine
 avinash.celest...@gmail.com mailto:avinash.celest...@gmail.com wrote:

 Hi Raphael

 In fact the problem with the UP rolls is exactly what I am grappling
 with now. It seems to me that one way is to look at the exact
 mapping of Unicode characters embedded within the files. One way of
 generating such maps is to use a plugin like PDFLIBs font reporter
 which works with Adobe
 Acrobat(http://www.pdflib.com/products/fontreporter/). Have you
 tried out this method and did it work for you? Do tell me if you (or
 anyone else) has given it a shot. I am planning to give it a go
 atleast...

 I have attached a sample roll (of an AC in Agra), along with the
 generated font report if anyone wants to give it a look

 A closer look at the roll shows that the main problem seems to be
 with the Devanagari 'matras' which are not rendering correctly when
 you cut and paste

 regards

 Avinash


 On Wed, Mar 12, 2014 at 12:19 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:

 Hey Siddhart, and Anand,

 I, too, am really interested in this, but have not made much
 progress
 yet. I think there are two ways to do this, neither of which is
 straightforward.

 The extract ward/village mentioned in roll PDF strategy is one
 option.
 Depending on raw data, this can however be cumbersome (one
 source in the
 vernacular, one in latin script, etc); I know a couple of
 scholars who
 attempt to do this and they are stuck all the time, having had to
 manually match rather frequently (which is a pain given that
 there are
 800.000 or so polling stations).

 Currently, we have the additional problem that many of the
 current roll
 PDFs - for instance in UP - are broken: one cannot copy-paste (or
 pdftotext, or extract through whatever means) from them, chiefly
 because
 the ToUnicodeCMap is corrupted by the version of CrystalReports
 the ECI
 is using. There is no real workaround other than reverse-OCR,
 which is a
 pain-in-the-a**. Let me know if you figure another way...

 The second option would be a very different strategy, namely GIS
 matching through next neighbour analysis: what is the closest
 Census
 village/ward around that particular polling booth (or the other way
 round - the computational challenge is to match ALL booths to at
 least
 one ward AND vice versa). Unfortunately, Census village/ward
 lat/long is
 not in the public domain, as far as I see - and using
 proprietary data

[datameet] Census village to polling booth matching

2014-04-17 Thread Raphael Susewind
Dear all,

I vaguely remember that some people are working on matching census
villages to polling booths, and wonder what progress they made. As some
of you know, I am currently doing this India-wide through an automated
spatial matching algorithm - but before releasing the result, it would
be nice to assess accuracy of this procedure more thoroughly.

The key problem I face is that polling stations are often not named
village X but primary school founded by Y - so that name matching
does not help too much in validation (certainly not in urban areas).

It would be better to check against roll part names (thus my email about
those a few days ago), but best would be if anyone has a manually
matched table of polling stations (2014 IDs) against PCLN (2011) or MDDS
(2011) census codes with which I could compare my results - if only at
the example of one state, or a few districts.

Alternatively, if somebody has too much time too offer and is familiar
with any specific district in greater detail, I could send along a
matching table for this district to see how well it fits. Please get in
touch in a direct mail in this case...

Any other ideas how to validate the matching table welcome,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Security Issues with the Voter List

2014-04-14 Thread Raphael Susewind
As a follow-up to this discussion:

electoralsearch.in began to implement rate limiting and selective IP
blocking yesterday. Sad as this is for my own research purposes, I
welcome the step from a privacy point of view...

Raphael

On 11.04.2014 10:56, Chandrashekhar Raman wrote:
 Raphael, To clarify, i am not trying to make a case against availability
 of fine grained data, far from it i'm with you on this argument among
 others that are made spuriously to restrict access. I might have
 stretched the point but then again - killing is just one extreme form of
 discrimination - there are others that are less visible
 
 you summed it up very well, its good to have a healthy caution and
 unease when dealing with some of this data,there are probably no simple
 answers here. 
 
 will read the paper at leisure.
 
 cs.
 
 
 On Fri, Apr 11, 2014 at 12:37 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Chandrashekhar,
 
 just on the specific issues of targeting communities, which I have
 thought about a great deal (my first book was on post-2002 Gujarat), my
 tentative conclusion is this:
 
 The fact that electoral rolls had been used in the past in riots before
 they were available online shows that rioters, if they want to, can
 access this data already. As Gautam pointed out, it IS public by law.
 What changes is merely the scale of data availability. Large-scale data
 would only be 'more useful' for large-scale targeting, however
 (small-scale targeting is possible already), which I don't see happening
 at this time (with the troublesome exception of Gujarat, particularly
 troublesome now that Mr Modi runs for PM - but here, too, the targeting
 happened in small units on the ground, even though coordination took
 place higher up). On the other hand, fine-grained large-scale data is
 absolutely necessary to understand a range of issues about (religious,
 caste) economic position. So that in this specific case, we have
 additional benefits but no additional risk (beyond the worrisome risk
 already out there)...
 
 More detailed arguments about this in a forthcoming paper of mine at
 http://pub.uni-bielefeld.de/publication/2631138
 
 Best,
 Raphael
 
 On 11.04.2014 08:49, Chandrashekhar Raman wrote:
  Raphael, you raise very pertinent issues.
 
  We as a community love open data and in this country there is a
 lot that
  can be done to free all kinds of data so that it can be made use
 of in a
  good way (election data in an aggregated form is one example). But at
  the same time there are certain kinds of data which are not open ( i
  mean not open in a machine readable format) for a good reason. I
 believe
  voter rolls data is one such type. In the past voter lists have been
  used to pinpoint members of specific communities which were then
  targeted with gruesome effect. Shudder to think what happens if it is
  automated, a 'riot app'?
 
  As Raphael points out this is not just about privacy, but could be
 much
  worse.
 
  This group is a fantastic initiative and as it evolves, it would be
  great for us to involve more social scientists and policy experts - so
  as we advocate vociferously to free more data and make it open -
 we can
  also bring in the technical expertise here to recommend where data
 needs
  to be better protected and how.
 
  cs
 
 
  On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind
  li...@raphael-susewind.de mailto:li...@raphael-susewind.de
 mailto:li...@raphael-susewind.de
 mailto:li...@raphael-susewind.de wrote:
 
  Hi Devdatta and Avinash,
 
  yes, I, too, am frankly surprised at the ease with which one
 can access
  sensitive data in bulk. Not only PDF rolls and voter details,
 but also
  things such as land records, BPL lists, and much more - I
 think we are
  in an exciting as well as dangerous phase of fairly uncontrolled,
  nascent e-Governance practices. But I think the ethical issues
 here are
  a little more complex than mere privacy concern.
 
  Upfront, I must admit that I use all the above sources for
 academic
  research (in UP and across India). What Avinash described in
 principle
  and at the example of Delhi can indeed be done on an all-India
 scale,
  and I am sure there are more people than just me who do it.
 
  But then the social sciences have long dealt with sensitive
 data and
  developed protocols to protect it. Even though the data is
 publicly
  available, I for instance have my own copy on a secure
 workstation with
  full disk encryption and two factor authentication. Whenever
 possible, I
  also work

Re: [datameet] Polling station names and roll part names

2014-04-14 Thread Raphael Susewind
Hi Anand,

thanks, I have the psleci data already (its the basis for my electoral
maps). As for the part names, I have looked around on the UP CEO site,
and found that the BLO detail search function contains both part name
and station name - but for an all-India solution, I will have to slowly
query electoralsearch.in I think - so that the rate limiting does not
kick in...

Lets see,
RAphael

On 14.04.2014 10:54, Anand Chitipothu wrote:
 Hi Raphael,
 
 It is possible to get the polling station names from:
 http://www.eci-polldaymonitoring.nic.in/psleci/Default.aspx
 
 I have a scrap of that data I can share with you if you want it.
 
 But if are looking for part name etc, I can't think of any other way
 than hitting election commission website with one query for polling
 booth. I don't think one query per booth should be considered mass
 queries. Did you try searching one the state election commission website
 with voter id?
 
 Anand
 
 
 
 On Mon, Apr 14, 2014 at 12:50 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Dear all,
 
 I am trying to find a list that links polling station names (usually
 something like City Montessory School Room 1) and roll part names
 (usually something like Mohalla XY), preferably in latin script.
 
 The PDF rolls have both data on the frontpage, but a) in regional
 scripts and b) usually not extractable (the encoding bug we already
 discussed on this list).
 
 electoralsearch.in http://electoralsearch.in shows both data if
 one searches with an EPIC id from
 that particular booth, but they shut out mass queries (and rightly so).
 
 Does anybody know of any other scrapable data source for this?
 
 Best,
 Raphael
 
 --
 Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
Papers  Blog | http://www.raphael-susewind.de
 
 Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)
 
 --
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet%2bunsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 
 
 -- 
 Anand
 http://anandology.com/
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Polling station names and roll part names

2014-04-14 Thread Raphael Susewind
Hi Anand,

short update: electoralsearch.in does return wrong part names (basically
they copy the station name field) at least for UP. So its back to CEO
sites...

Raphael

On 14.04.2014 10:54, Anand Chitipothu wrote:
 Hi Raphael,
 
 It is possible to get the polling station names from:
 http://www.eci-polldaymonitoring.nic.in/psleci/Default.aspx
 
 I have a scrap of that data I can share with you if you want it.
 
 But if are looking for part name etc, I can't think of any other way
 than hitting election commission website with one query for polling
 booth. I don't think one query per booth should be considered mass
 queries. Did you try searching one the state election commission website
 with voter id?
 
 Anand
 
 
 
 On Mon, Apr 14, 2014 at 12:50 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Dear all,
 
 I am trying to find a list that links polling station names (usually
 something like City Montessory School Room 1) and roll part names
 (usually something like Mohalla XY), preferably in latin script.
 
 The PDF rolls have both data on the frontpage, but a) in regional
 scripts and b) usually not extractable (the encoding bug we already
 discussed on this list).
 
 electoralsearch.in http://electoralsearch.in shows both data if
 one searches with an EPIC id from
 that particular booth, but they shut out mass queries (and rightly so).
 
 Does anybody know of any other scrapable data source for this?
 
 Best,
 Raphael
 
 --
 Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
Papers  Blog | http://www.raphael-susewind.de
 
 Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)
 
 --
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet%2bunsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 
 
 -- 
 Anand
 http://anandology.com/
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- You received this message because you are subscribed to the Google
Groups datameet group.
To unsubscribe from this group and stop receiving emails from it, send
an email to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Security Issues with the Voter List

2014-04-11 Thread Raphael Susewind
 to
 conform to the government format, and fail the Luhn Checksum
 test used to validate them. It is likely that other states are
 in a similar, if not worse condition
 
 
 Regards,
 
 Devdatta Tengshe
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Security Issues with the Voter List

2014-04-11 Thread Raphael Susewind
Chandrashekhar,

just on the specific issues of targeting communities, which I have
thought about a great deal (my first book was on post-2002 Gujarat), my
tentative conclusion is this:

The fact that electoral rolls had been used in the past in riots before
they were available online shows that rioters, if they want to, can
access this data already. As Gautam pointed out, it IS public by law.
What changes is merely the scale of data availability. Large-scale data
would only be 'more useful' for large-scale targeting, however
(small-scale targeting is possible already), which I don't see happening
at this time (with the troublesome exception of Gujarat, particularly
troublesome now that Mr Modi runs for PM - but here, too, the targeting
happened in small units on the ground, even though coordination took
place higher up). On the other hand, fine-grained large-scale data is
absolutely necessary to understand a range of issues about (religious,
caste) economic position. So that in this specific case, we have
additional benefits but no additional risk (beyond the worrisome risk
already out there)...

More detailed arguments about this in a forthcoming paper of mine at
http://pub.uni-bielefeld.de/publication/2631138

Best,
Raphael

On 11.04.2014 08:49, Chandrashekhar Raman wrote:
 Raphael, you raise very pertinent issues.
 
 We as a community love open data and in this country there is a lot that
 can be done to free all kinds of data so that it can be made use of in a
 good way (election data in an aggregated form is one example). But at
 the same time there are certain kinds of data which are not open ( i
 mean not open in a machine readable format) for a good reason. I believe
 voter rolls data is one such type. In the past voter lists have been
 used to pinpoint members of specific communities which were then
 targeted with gruesome effect. Shudder to think what happens if it is
 automated, a 'riot app'?
 
 As Raphael points out this is not just about privacy, but could be much
 worse.
 
 This group is a fantastic initiative and as it evolves, it would be
 great for us to involve more social scientists and policy experts - so
 as we advocate vociferously to free more data and make it open - we can
 also bring in the technical expertise here to recommend where data needs
 to be better protected and how.
 
 cs
 
 
 On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Hi Devdatta and Avinash,
 
 yes, I, too, am frankly surprised at the ease with which one can access
 sensitive data in bulk. Not only PDF rolls and voter details, but also
 things such as land records, BPL lists, and much more - I think we are
 in an exciting as well as dangerous phase of fairly uncontrolled,
 nascent e-Governance practices. But I think the ethical issues here are
 a little more complex than mere privacy concern.
 
 Upfront, I must admit that I use all the above sources for academic
 research (in UP and across India). What Avinash described in principle
 and at the example of Delhi can indeed be done on an all-India scale,
 and I am sure there are more people than just me who do it.
 
 But then the social sciences have long dealt with sensitive data and
 developed protocols to protect it. Even though the data is publicly
 available, I for instance have my own copy on a secure workstation with
 full disk encryption and two factor authentication. Whenever possible, I
 also work on anonymized subsets of data. Yet there are other potential
 uses - some of the more worrisome you pointed out - which are not bound
 by such data protection standards.
 
 To me, this once more highlights the nascent stage of ethical standards
 around Big Data and eGovernance. On the plus side, I am happy to have
 that kind of access to conduct research which will ultimately be
 ethically beneficial, leading to better understanding of social issues
 and potentially to better policy advice. Also, there is a point to be
 made that transparency is an important asset in elections in particular,
 not only in terms of individual electoral search functions, but also in
 terms of publicly accessible (and cross-checkable, publicly verifiable)
 PDF rolls. Finally, a lot of this data had been available in the past as
 well, only in distributed and/or commercial form, which means there had
 been a hierarchy of access: small-time crooks could not use it, but
 large-time crooks were always able to use it. Likewise, scholars at
 large (often foreign) universities were able to use it, but not smaller
 ones (this is still true for some data, geodata in particular, which I
 can only access because of Ivy-League contacts and only process because
 of an association with Oxford University).
 
 The ethical challenge as I see it thus comes not from data availability

Re: [datameet] Please Comment on Copyright License for DataMeet Work

2014-04-11 Thread Raphael Susewind
Hi all,

there is a good comparison of CC vs ODBl when applied to data at
http://www.dcc.ac.uk/resources/how-guides/license-research-data

also, any specific reason to use CC 2.0? There are CC 4.0 licenses
already, arguably more developed (and also more suitable for data, see
link above)...

My five cents,
Raphael

On 11.04.2014 09:24, Thejesh GN wrote:
 This is for the work related to DataMeet, Produced by DataMeet as part
 of events, hackathons or general work, for what sits on one of the
 DataMeet accounts. Like
 https://github.com/datameet
 https://www.youtube.com/user/datameet
 
 _This doesn't apply to work by individuals themselves._
 
 I am listing the license and thought process behind them. Please do comment.
 
 ---
 *For artifacts: **CC BY-SA 2.0*
 https://creativecommons.org/licenses/by-sa/2.0/
 *Idea:* Allow everyone to use it, in any way they want, as long as they
 attribute and share in similar way
 
 Share — copy and redistribute the material in any medium or format
 Adapt — remix, transform, and build upon the material for any purpose,
 even commercially.
 
 Attribution — You must give appropriate credit, provide a link to the
 license, and indicate if changes were made. You may do so in any
 reasonable manner, but not in any way that suggests the licensor
 endorses you or your use.
 ShareAlike — If you remix, transform, or build upon the material, you
 must distribute your contributions under the same license as the original. 
 
 
 *For code: GNU/GPL*
 https://www.gnu.org/copyleft/gpl.html
 Allows commercial use and make them share alike just like (but not same)
 the  CC BY-SA 2.0
 
 - Allows remix, share, distribute (all 5 freedoms)
 - Allows commercial usage
 - Makes attribution and share - compulsory
 
 
 
 *For Data : Open Data Commons Open Database License (ODbL)*
 
 If we want to use specific license for data then we can use this. This
 is similar to CC BY SA 2.0 http://opendatacommons.org/licenses/odbl/summary/
 
 You are free:
 To Share: To copy, distribute and use the database.
 To Create: To produce works from the database.
 To Adapt: To modify, transform and build upon the database.
 As long as you:
 Attribute: You must attribute any public use of the database, or works
 produced from the database, in the manner specified in the ODbL. For any
 use or redistribution of the database, or works produced from it, you
 must make clear to others the license of the database and keep intact
 any notices on the original database.
 Share-Alike: If you publicly use any adapted version of this database,
 or works produced from an adapted database, you must also offer that
 adapted database under the ODbL.
 Keep open: If you redistribute the database, or an adapted version of
 it, then you may use technological measures that restrict the work (such
 as DRM) as long as you also redistribute a version without such measures.
 -
 
 
 Note: If we are extending some ones code/data/artifact, we can continue
 to use the license which the original author has used it. Its easy that
 way. If we start one fresh we can use one of ours.
 
 Lets discuss this on the list. I will blog the conclusions/results on
 datameet.org/blog http://datameet.org/blog next wednesday for future
 reference. 
 
 
 Thanks a lot for your time.
 
 
 Thej
 --
 Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
 http://thejeshgn.com
 GPG ID :  0xBFFC8DD3C06DD6B0
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Please Comment on Copyright License for DataMeet Work

2014-04-11 Thread Raphael Susewind
Additional advantage of ODbl is that different parts of a compound
dataset can have different licenses, which makes it easier for pulling
together stuff from different sources.

On 11.04.2014 09:50, Thejesh GN wrote:
 
 We can use  CC-BY-SA-4.0 for artifacts. It looks better and has
 everything CC-BY-SA-2.0 has
  
 https://creativecommons.org/licenses/by-sa/4.0/
 
 Share — copy and redistribute the material in any medium or format
 Adapt — remix, transform, and build upon the material
 for any purpose, even commercially.
 
 As long as
 Attribution — You must give appropriate credit, provide a link to the
 license, and indicate if changes were made. You may do so in any
 reasonable manner, but not in any way that suggests the licensor
 endorses you or your use.
 
 ShareAlike — If you remix, transform, or build upon the material, you
 must distribute your contributions under the same license as the original.
 
 No additional restrictions — You may not apply legal terms or
 technological measures that legally restrict others from doing anything
 the license permits.
 
 
 
 I think ODC-ODbl is good choice for data. It allows all kind of usage,
 along with attribution, sharealike and keep it open condition. Unless we
 have better choice, I think we can go with ODC-ODbl. 
 
 
 
 
 Thej
 --
 Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
 http://thejeshgn.com
 GPG ID :  0xBFFC8DD3C06DD6B0
 
 
 On Fri, Apr 11, 2014 at 12:57 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Hi all,
 
 there is a good comparison of CC vs ODBl when applied to data at
 http://www.dcc.ac.uk/resources/how-guides/license-research-data
 
 also, any specific reason to use CC 2.0? There are CC 4.0 licenses
 already, arguably more developed (and also more suitable for data, see
 link above)...
 
 My five cents,
 Raphael
 
 On 11.04.2014 09:24, Thejesh GN wrote:
  This is for the work related to DataMeet, Produced by DataMeet as part
  of events, hackathons or general work, for what sits on one of the
  DataMeet accounts. Like
  https://github.com/datameet
  https://www.youtube.com/user/datameet
 
  _This doesn't apply to work by individuals themselves._
 
  I am listing the license and thought process behind them. Please
 do comment.
 
  ---
  *For artifacts: **CC BY-SA 2.0*
  https://creativecommons.org/licenses/by-sa/2.0/
  *Idea:* Allow everyone to use it, in any way they want, as long as
 they
  attribute and share in similar way
 
  Share — copy and redistribute the material in any medium or format
  Adapt — remix, transform, and build upon the material for any purpose,
  even commercially.
 
  Attribution — You must give appropriate credit, provide a link to the
  license, and indicate if changes were made. You may do so in any
  reasonable manner, but not in any way that suggests the licensor
  endorses you or your use.
  ShareAlike — If you remix, transform, or build upon the material, you
  must distribute your contributions under the same license as the
 original.
 
  
  *For code: GNU/GPL*
  https://www.gnu.org/copyleft/gpl.html
  Allows commercial use and make them share alike just like (but not
 same)
  the  CC BY-SA 2.0
 
  - Allows remix, share, distribute (all 5 freedoms)
  - Allows commercial usage
  - Makes attribution and share - compulsory
 
  
 
  *For Data : Open Data Commons Open Database License (ODbL)*
 
  If we want to use specific license for data then we can use this. This
  is similar to CC BY SA 2.0
 http://opendatacommons.org/licenses/odbl/summary/
 
  You are free:
  To Share: To copy, distribute and use the database.
  To Create: To produce works from the database.
  To Adapt: To modify, transform and build upon the database.
  As long as you:
  Attribute: You must attribute any public use of the database, or works
  produced from the database, in the manner specified in the ODbL.
 For any
  use or redistribution of the database, or works produced from it, you
  must make clear to others the license of the database and keep intact
  any notices on the original database.
  Share-Alike: If you publicly use any adapted version of this database,
  or works produced from an adapted database, you must also offer that
  adapted database under the ODbL.
  Keep open: If you redistribute the database, or an adapted version of
  it, then you may use technological measures that restrict the work
 (such
  as DRM) as long as you also redistribute a version without such
 measures.
  -
 
 
  Note: If we are extending some ones code/data/artifact, we can
 continue
  to use the license which

[datameet] Mapping elections: open GIS shapefile drafts

2014-04-09 Thread Raphael Susewind
Dear all,

Krishna Prasnth's plea for AC shapefiles made me decide to start pushing
mine out there ahead of time in draft form at least. I would have loved
to have them ready before the Bangalore hackathon, but  such things take
time and I am quite busy.

Still, here they come at last: draft GIS shapefiles of parliamentary
constituencies, assembly constituencies and polling booth localities,
published under an open license (CC-BY-NC-SA 4.0):

http://www.raphael-susewind.de/blog/2014/mapping-indias-election

Unlike the hackathon files, these were created using an automated
algorithm (described in the blog post above). I intend to release (and
long-time archive) them by end of the month, and would welcome comments
and feedback until then: if you are familiar with both GIS and a
specific state, it would help me a lot if you could have a look.
Likewise, comments on the general method are very welcome.

So far, the smaller states are online, but I will add more on a rolling
basis - computing takes a few hours per constituency (longer for the
larger states). I hope to complete the set by end of the week.

Let me know if you find them useful,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] PDF scraping

2014-04-08 Thread Raphael Susewind
With linux and xpdf-tools its as easy as

pdftotext xyz.pdf
wc -w xyz.txt

Best,
Raphael

On 08.04.2014 20:44, Eric Dodge wrote:
 Seems like there are 2 steps here, getting the text into a more usable
 format and then getting the word counts. There are programs that let you
 dump pdf into text (http://pdf2txt.software.informer.com/3.2/ for
 example) in batches. Then paste the text into a tool like this
 (http://www.textfixer.com/tools/online-word-counter.php) to get the word
 counts.
 
 Eric
 
 
 On Tue, Apr 8, 2014 at 12:54 AM, Suren Makkar suren.mak...@gmail.com
 mailto:suren.mak...@gmail.com wrote:
 
 Hey guys,
 
 Quick Rookie question, I'm trying to get total word counts for all
 occurring words in a bunch of PDFs, and I am lost. Help?
 
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-16 Thread Raphael Susewind
Hi Siddhart,

for my UP dataset, I used spatial matching of polling booth locations
against the MODIS urban extent satellite layer of 2002 - tends to be
larger urban centres, though. Another option is to look at how many
polling stations have multiple booths [polling stations being defined
as booths with almost same name in almost same location] - this turned
out to be a rather accurate (and up-to-date) representation of the
urban as well as small town - only real rural stations have only one
booth, in my experience (UP)...

Best,
Raphael

On 16.03.2014 06:03, Siddarth Raman wrote:
 Hi Avinash,
 
 Thanks a ton for pointing out the excel files with delimitation. I read
 what you wrote. Will take a look at the zip fie and cross-check. I too
 had hoped the district mapping was contiguous with some political
 boundaries, but they aren't. Bangalore, funnily has a ward (44 I think)
 which is split across three different patches of land which don't share
 a boundary! 
 
 For those interested in more background regarding the why of it all...
 
 I was curious to understand what according to anyone is an Urban
 Parliamentary constituency? Mint had done a study a while back
 - 
 http://www.livemint.com/Specials/XovcjYRkWCBLJSwQwxY6wN/India-has-only-53-predominantly-urban-constituencies.html
  -
 their main source was the million plus cities of India as per census.
 That sparked off the thought. I wanted to dig deeper. I thought that
 while one might disagree with the census definition of urban, it's a
 basis to begin with. Was hoping to look at all PC and AC with a % urban.
 50% would imply urban constituency (perhaps not the best method, but
 seemed like a good start)
 
 I guess it isn't as easy as I imagined, but still would be good to
 figure out. Do let me know if anyone has other ideas.
 
 Regards,
 Siddarth
 
 
 On Saturday, March 15, 2014 2:31:34 PM UTC+5:30, Avinash Celestine wrote:
 
 hmm yes thats true. its basically an inefficient way to engineer
 seat gains - there are many other more efficient ways! 
 
 A
 
 
 
 
 On Sat, Mar 15, 2014 at 2:00 PM, Srinivasan Ramani
 sriniv...@gmail.com javascript: wrote:
 
 Interjecting in a fantastic conversation... (Kudos to Avinash 
 Raphael and others for the efforts to mix/match AC-PC and
 administrative jurisdictions)..
 
 There is no direct containment of ACs within a district. Case in
 point is Delhi, where ACs dont' fit single districts at all. 
 
 Avinash, 
 
 Trouble with the kind of political delimitation that you talk
 about is that..it doesn't really serve any purpose. With
 cross-determination of powers at various levels - blocks, wards,
 districts under the bureaucracy vis-a-vis MLAs, changing
 administrative jurisdictions doesn't make much sense as much as
 doing direct gerrymandering for political vote-gaining. In other
 words, the powers of a MLA administratively is much too nebulous
 as compared to district officials across the bureaucracy and the
 third tier of democracy. 
 
 
 On Sat, Mar 15, 2014 at 1:49 PM, Avinash Celestine
 avinash@gmail.com javascript: wrote:
 
 unfortunately you may be right... so thats another layer of
 complexity...
 
 On a slightly related note, i have often thought, though i
 dont know if its actually possible in practice, for
 governments to do some delimitation on their own (for
 political purposes). For instance, if a village/area is near
 the border of a constituency, its possible through an order
 to bring it under the administrative jurisdiction of a
 neighbouring district. If that district is then served by a
 different AC, you have effectively done some delimitation of
 your own, without actually calling it that
 
 given that delimitation papers don't specify individual
 villages in many cases, it seems entirely possible to do...
 
 looking forward to your dataset, Raphael!
 
 avinash
 
 
 On Sat, Mar 15, 2014 at 1:33 PM, Raphael Susewind
 li...@raphael-susewind.de javascript: wrote:
 
 Might well be the rule (I remember having read something
 like this,
 too), but the reality apparently differs (at least in
 the EC's own
 data)... Never depend on rules, check them! ;-)
 
 On 15.03.2014 08:58, Avinash Celestine wrote:
  thanks. the rule, as far as i remember, is that ACs
 are entirely
  contained within a district boundary. PCs, on the
 other hand, can span
  across district boundaries

Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-15 Thread Raphael Susewind
Hi Avinash and all,

I realized that each constituency falls within only one district in your
file, but there are constituencies that span several districts and vice
versa (rare, but it happens). I attached a list of those, extracted from
polling-station data on eci-polldaymonitoring.nic.in. These are AC only,
naturally the problem would proliferate if you aggregate to PC,

Hope it helps,
Raphael

On 15.03.2014 06:57, Avinash Celestine wrote:
 hi
 
 attached an excel with AC-PC-district -states matching along with codes
 for AC-PC. I can add census district codes if you like...give me a day
 or two
 
 some states are not present - like JK... if someone could add those
 that would be great
 
 Avinash
 
 
 On Fri, Mar 14, 2014 at 10:27 PM, indro ray rayindro@gmail.com
 mailto:rayindro@gmail.com wrote:
 
 Hi Anand (Chitipothu),
 Can I know the source from where you get the polling booth and ward
 data? Is it individual for each state and does it provide the
 lat-long for the polling booths?
 
 Thanks,
 Indro
 
 
 On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu
 anandol...@gmail.com mailto:anandol...@gmail.com wrote:
 
 
 
 On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
 thriddas.ano...@gmail.com mailto:thriddas.ano...@gmail.com
 wrote:
 
 Hi All,
 
 In line with the discussions on elections, this is something
 I'd started working on a while back (and dropped). I was
 essentially hoping for a PC to AC to Ward mapping. As far as
 I understand, census 2011 has population data either at the
 level of the ward or the district, so if we had to run even
 rudimentary data analysis on a parliamentary or assembly
 constituency (like total population) accurately, I'm
 guessing we need to go bottom up.
 
 I had started this by attempting to
 convert 
 http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf 
 into
 excel (using a mixture of pattern matching in notepad++ and
 a bit of excel vb). It's time consuming (largely because
 each state follows its own convention - not standardized)
 
 Any suggestions on how one might go about this? If I wanted
 to estimate the population in a parliamentary constituency,
 or the total households, or the urban/rural split, how would
 I go about it? Is there a better method than looking at the
 above demarcation notification? Are there datasets on this
 already?
 
 New to the group, didn't find any prior discussions on
 Parliamentary to Assembly to Ward/Village demarcations. 
 
 
 Hi Siddarth,
 
 The voter list PDFs have the ward info for each polling booth.
 The PDFs have the number of voter, but not the population. So it
 possible to sum up those number to get a count of number of
 voters in a PC or AC.
 
 If you want polling  booth to ward mapping, I'll be able to
 provide it.
 
 Anand
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the
 Google Groups datameet group.
 To unsubscribe from this group and stop receiving emails from
 it, send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails

Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-15 Thread Raphael Susewind
 on Parliamentary to Assembly to Ward/Village
 demarcations. 
 
 
 Hi Siddarth,
 
 The voter list PDFs have the ward info for each polling
 booth. The PDFs have the number of voter, but not the
 population. So it possible to sum up those number to get
 a count of number of voters in a PC or AC.
 
 If you want polling  booth to ward mapping, I'll be able
 to provide it.
 
 Anand
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to
 the Google Groups datameet group.
 To unsubscribe from this group and stop receiving emails
 from it, send an email to
 datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the
 Google Groups datameet group.
 To unsubscribe from this group and stop receiving emails
 from it, send an email to
 datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-15 Thread Raphael Susewind
Might well be the rule (I remember having read something like this,
too), but the reality apparently differs (at least in the EC's own
data)... Never depend on rules, check them! ;-)

On 15.03.2014 08:58, Avinash Celestine wrote:
 thanks. the rule, as far as i remember, is that ACs are entirely
 contained within a district boundary. PCs, on the other hand, can span
 across district boundaries.
 
 A
 
 
 On Sat, Mar 15, 2014 at 1:19 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Hi Avinash and all,
 
 I realized that each constituency falls within only one district in your
 file, but there are constituencies that span several districts and vice
 versa (rare, but it happens). I attached a list of those, extracted from
 polling-station data on eci-polldaymonitoring.nic.in
 http://eci-polldaymonitoring.nic.in. These are AC only,
 naturally the problem would proliferate if you aggregate to PC,
 
 Hope it helps,
 Raphael
 
 On 15.03.2014 06:57, Avinash Celestine wrote:
  hi
 
  attached an excel with AC-PC-district -states matching along with
 codes
  for AC-PC. I can add census district codes if you like...give me a day
  or two
 
  some states are not present - like JK... if someone could add those
  that would be great
 
  Avinash
 
 
  On Fri, Mar 14, 2014 at 10:27 PM, indro ray
 rayindro@gmail.com mailto:rayindro@gmail.com
  mailto:rayindro@gmail.com mailto:rayindro@gmail.com
 wrote:
 
  Hi Anand (Chitipothu),
  Can I know the source from where you get the polling booth and
 ward
  data? Is it individual for each state and does it provide the
  lat-long for the polling booths?
 
  Thanks,
  Indro
 
 
  On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu
  anandol...@gmail.com mailto:anandol...@gmail.com
 mailto:anandol...@gmail.com mailto:anandol...@gmail.com wrote:
 
 
 
  On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
  thriddas.ano...@gmail.com
 mailto:thriddas.ano...@gmail.com mailto:thriddas.ano...@gmail.com
 mailto:thriddas.ano...@gmail.com
  wrote:
 
  Hi All,
 
  In line with the discussions on elections, this is
 something
  I'd started working on a while back (and dropped). I was
  essentially hoping for a PC to AC to Ward mapping. As
 far as
  I understand, census 2011 has population data either
 at the
  level of the ward or the district, so if we had to run
 even
  rudimentary data analysis on a parliamentary or assembly
  constituency (like total population) accurately, I'm
  guessing we need to go bottom up.
 
  I had started this by attempting to
  convert
 
 http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf
 into
  excel (using a mixture of pattern matching in
 notepad++ and
  a bit of excel vb). It's time consuming (largely because
  each state follows its own convention - not standardized)
 
  Any suggestions on how one might go about this? If I
 wanted
  to estimate the population in a parliamentary
 constituency,
  or the total households, or the urban/rural split, how
 would
  I go about it? Is there a better method than looking
 at the
  above demarcation notification? Are there datasets on this
  already?
 
  New to the group, didn't find any prior discussions on
  Parliamentary to Assembly to Ward/Village demarcations.
 
 
  Hi Siddarth,
 
  The voter list PDFs have the ward info for each polling booth.
  The PDFs have the number of voter, but not the population.
 So it
  possible to sum up those number to get a count of number of
  voters in a PC or AC.
 
  If you want polling  booth to ward mapping, I'll be able to
  provide it.
 
  Anand
 
  --
  For more details about this list
  http://datameet.org/discussions/
  ---
  You received this message because you are subscribed to the
  Google Groups datameet group.
  To unsubscribe from this group and stop receiving emails from
  it, send an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet%2bunsubscr...@googlegroups.com
  mailto:datameet+unsubscr...@googlegroups.com
 mailto:datameet%2bunsubscr...@googlegroups.com.
  For more

Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-13 Thread Raphael Susewind
Hey Avinash,

yep - thats what I figured, too. Not only misplaced matras (those could
be rearranged), but a real garbling, which cannot be resolved as far as
I see. Worse, there isnt even a clear pattern - for a few
constituencies, I fed the Voter ID (which is in latin script) to the
search roll details by voter ID function on the CEO website, which
returns the properly written unicode name. I then compared garbled name
and unicode name to see if there are any statistical regularities - yet
unfortunately, there are a thousand ways of garbling Avinash - its not
always Abniszhaa.

The only solution I can think of is the following (but I have not
implemented it): train TesserAct (an IndicScript OCR) with the exact
font used in the PDF reports, so that it almost perfectly recognizes
something written in this font (this was a stumblestone for me, rather
complicated work), then extract images of text areas of interest, and
run them through OCR. If you want to give it a shot...

Otherwise, we could only try to convince the EC to fix the bug in
Crystal Reports, and re-generate all PDFs - which is highly unlikely,
they have more important things to do right now (the PDFs display and
print alright, after all, just text extraction does not work - they
would perhaps even consider it a feature rather than a bug).

It might be useful to compile a list of states where this problem occurs
- I have seen it in Gujarat and UP for sure, but don't know whether it
happens everywhere,

Best,
Raphael

On 13.03.2014 05:35, Avinash Celestine wrote:
 well i checked out the unicode table and it only confirms what we knew
 anyway... that there's duplication of unicode hex values for different
 characters... 
 
 So i guess its back to the drawing board.
 
 
 On Thu, Mar 13, 2014 at 9:43 AM, Avinash Celestine
 avinash.celest...@gmail.com mailto:avinash.celest...@gmail.com wrote:
 
 Hi Raphael
 
 In fact the problem with the UP rolls is exactly what I am grappling
 with now. It seems to me that one way is to look at the exact
 mapping of Unicode characters embedded within the files. One way of
 generating such maps is to use a plugin like PDFLIBs font reporter
 which works with Adobe
 Acrobat(http://www.pdflib.com/products/fontreporter/). Have you
 tried out this method and did it work for you? Do tell me if you (or
 anyone else) has given it a shot. I am planning to give it a go
 atleast...
 
 I have attached a sample roll (of an AC in Agra), along with the
 generated font report if anyone wants to give it a look
 
 A closer look at the roll shows that the main problem seems to be
 with the Devanagari 'matras' which are not rendering correctly when
 you cut and paste
 
 regards
 
 Avinash
 
 
 On Wed, Mar 12, 2014 at 12:19 PM, Raphael Susewind
 li...@raphael-susewind.de mailto:li...@raphael-susewind.de wrote:
 
 Hey Siddhart, and Anand,
 
 I, too, am really interested in this, but have not made much
 progress
 yet. I think there are two ways to do this, neither of which is
 straightforward.
 
 The extract ward/village mentioned in roll PDF strategy is one
 option.
 Depending on raw data, this can however be cumbersome (one
 source in the
 vernacular, one in latin script, etc); I know a couple of
 scholars who
 attempt to do this and they are stuck all the time, having had to
 manually match rather frequently (which is a pain given that
 there are
 800.000 or so polling stations).
 
 Currently, we have the additional problem that many of the
 current roll
 PDFs - for instance in UP - are broken: one cannot copy-paste (or
 pdftotext, or extract through whatever means) from them, chiefly
 because
 the ToUnicodeCMap is corrupted by the version of CrystalReports
 the ECI
 is using. There is no real workaround other than reverse-OCR,
 which is a
 pain-in-the-a**. Let me know if you figure another way...
 
 The second option would be a very different strategy, namely GIS
 matching through next neighbour analysis: what is the closest
 Census
 village/ward around that particular polling booth (or the other way
 round - the computational challenge is to match ALL booths to at
 least
 one ward AND vice versa). Unfortunately, Census village/ward
 lat/long is
 not in the public domain, as far as I see - and using
 proprietary data
 to do the matching is legally complicated (even if one redistributes
 only the matching result and not the proprietary data).
 
 My 5 cents,
 Let us know of any progress,
 
 Raphael
 
 On 12.03.2014 05:17, Anand Chitipothu wrote:
 
  On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu
 anandol

Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-12 Thread Raphael Susewind
Hey Siddhart, and Anand,

I, too, am really interested in this, but have not made much progress
yet. I think there are two ways to do this, neither of which is
straightforward.

The extract ward/village mentioned in roll PDF strategy is one option.
Depending on raw data, this can however be cumbersome (one source in the
vernacular, one in latin script, etc); I know a couple of scholars who
attempt to do this and they are stuck all the time, having had to
manually match rather frequently (which is a pain given that there are
800.000 or so polling stations).

Currently, we have the additional problem that many of the current roll
PDFs - for instance in UP - are broken: one cannot copy-paste (or
pdftotext, or extract through whatever means) from them, chiefly because
the ToUnicodeCMap is corrupted by the version of CrystalReports the ECI
is using. There is no real workaround other than reverse-OCR, which is a
pain-in-the-a**. Let me know if you figure another way...

The second option would be a very different strategy, namely GIS
matching through next neighbour analysis: what is the closest Census
village/ward around that particular polling booth (or the other way
round - the computational challenge is to match ALL booths to at least
one ward AND vice versa). Unfortunately, Census village/ward lat/long is
not in the public domain, as far as I see - and using proprietary data
to do the matching is legally complicated (even if one redistributes
only the matching result and not the proprietary data).

My 5 cents,
Let us know of any progress,

Raphael

On 12.03.2014 05:17, Anand Chitipothu wrote:
 
 On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu anandol...@gmail.com
 mailto:anandol...@gmail.com wrote:
 
 
 
 On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
 thriddas.ano...@gmail.com mailto:thriddas.ano...@gmail.com wrote:
 
 Hi All,
 
 In line with the discussions on elections, this is something I'd
 started working on a while back (and dropped). I was essentially
 hoping for a PC to AC to Ward mapping. As far as I understand,
 census 2011 has population data either at the level of the ward
 or the district, so if we had to run even rudimentary data
 analysis on a parliamentary or assembly constituency (like total
 population) accurately, I'm guessing we need to go bottom up.
 
 I had started this by attempting to
 convert 
 http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf 
 into
 excel (using a mixture of pattern matching in notepad++ and a
 bit of excel vb). It's time consuming (largely because each
 state follows its own convention - not standardized)
 
 Any suggestions on how one might go about this? If I wanted to
 estimate the population in a parliamentary constituency, or the
 total households, or the urban/rural split, how would I go about
 it? Is there a better method than looking at the above
 demarcation notification? Are there datasets on this already?
 
 New to the group, didn't find any prior discussions on
 Parliamentary to Assembly to Ward/Village demarcations. 
 
 
 Hi Siddarth,
 
 The voter list PDFs have the ward info for each polling booth. The
 PDFs have the number of voter, but not the population. So it
 possible to sum up those number to get a count of number of voters
 in a PC or AC.
 
 If you want polling  booth to ward mapping, I'll be able to provide it.
 
 
 btw, Anand Doshi has already parsed that PDF. The results are available at:
 
 https://gist.github.com/anandpdoshi/9448203
 
 Anand
 P.S: uff, so many Anands on this list
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com
 mailto:datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] India 2001 census data village-wise

2014-03-03 Thread Raphael Susewind
Hey,

this was sent over the list a few days ago:

Hi all,

I know some census 2011 data has been posted already, but I thought I'd
share the primary abstract data I have down to the town/village level.
You can download it here: http://journeyman-data.com/census2011/. Please
see the variable list/readme for details.

Best,
Eric Dodge

On 01.03.2014 06:23, Fenella C wrote:
 Hello everyone, 
 
 I am wondering if any of you have the village-wise 2001 Indian census
 data in a spreadsheet (or similar) format? I am basically looking for
 information at the village level from the 2001 census (e.g., population
 of the village, number of households in the village, etc.)
 
 The data is available online at the census website
 here 
 http://www.censusindia.gov.in/Census_Data_2001/Village_Directory/View_data/Village_Profile.aspx
 but it is not available in a spreadsheet. I have already tried web
 scraping the data, but it is painfully slow, so I'm wondering if I can
 find it elsewhere.
 
 Many thanks,
 Fenella
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [datameet] Polling stations

2014-01-09 Thread Raphael Susewind
Dear Anand,

I should probably first read properly then respond... If you are not
after the GIS data per se, you should be able to get the data in either

a) the Form 20 returns of past elections on each CEO site

b) [perhaps more useful] the Download Electoral Roll as PDF databases
of each CEO - you don't have to scrape the actual PDFs but could just
use the information in the dropdown lists they usually use

Best,
Raphael

On 09.01.2014 09:08, Anand Chitipothu wrote:
 Hi,
 
 I'm looking for information like, name, constituency, number of voters
 etc. of all polling stations in India. Has someone already scrapped this
 data?
 
 The names of the polling stations is available at:
 
 http://www.eci-polldaymonitoring.nic.in/psl/default.aspx
 
 Are there any other places where this information is available?
 
 Anand
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [datameet] Polling stations

2014-01-09 Thread Raphael Susewind
Hey Anand,

I have done quite a bit of work on this. One problem is, there are two
datasets - one cleaned up, one preliminary:

http://www.eci-polldaymonitoring.nic.in/psl/default.aspx
http://www.eci-polldaymonitoring.nic.in/psleci/default.aspx

For the preliminary data for UP, have a look at my website here:
http://data.raphael-susewind.de/content/gis-shapefiles

For all-India data, I have the preliminary raw point data, but can't
make up my mind whether I should clean it up and make it available now,
or hope that the EC themselves will clean it up further in the current
run-up to the general elections, in which case I could save myself the
trouble and just wait a few months longer.

Also, I am wary of the polling booth ID codes at the moment; for UP, for
instance, they changed slightly with the current roll revision - booth
IDs from 2011-13 are not necessarily the same as those in 2014.
Currently, my website operates with 2011-13 IDs, and I intend to wait a
little longer until I upgrade to 2014 IDs...

And nope, there are no other places where this data is available to my
knowledge (unless you know somebody deep inside the NIC or EC),

Best,
Raphael

On 09.01.2014 09:08, Anand Chitipothu wrote:
 Hi,
 
 I'm looking for information like, name, constituency, number of voters
 etc. of all polling stations in India. Has someone already scrapped this
 data?
 
 The names of the polling stations is available at:
 
 http://www.eci-polldaymonitoring.nic.in/psl/default.aspx
 
 Are there any other places where this information is available?
 
 Anand
 
 -- 
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google
 Groups datameet group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers  Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.