Bug#271397: enamdict: add frequency statistic

2011-08-11 Thread Jim Breen
I would be quite happy to add some sort of frequency metric to given and family names in the ENAMDICT file. The trouble is I have no time spare to go digging out the data. If someone else were prepared to compile it, I'd be glad to add it. Jim Breen 2011/8/11 Osamu Aoki os...@debian.org: Hi,

Bug#271397: enamdict: add frequency statistic

2011-08-11 Thread Osamu Aoki
Hi, On Thu, Aug 11, 2011 at 06:00:55PM +1000, Jim Breen wrote: I would be quite happy to add some sort of frequency metric to given and family names in the ENAMDICT file. The trouble is I have no time spare to go digging out the data. I have found a data as below in CSV format for family

Bug#271397: enamdict: add frequency statistic

2011-08-11 Thread Jim Breen
こんばんは, 2011/8/11 Osamu Aoki os...@debian.org: I have found a data as below in CSV format for family name. Anyway raw data has a bit over 100,600 names. Given name is a bit difficult. Yes, but family names is a great start. It looks like sei,rank,number 佐藤,1位,481980 鈴木,2位,426804

Bug#271397: enamdict: add frequency statistic

2011-08-10 Thread Osamu Aoki
Hi, This is about: http://bugs.debian.org/271397 Mr. Tashiro is quite obvious.(% population uses, popularity position) 田代(0.061%, #287th) - I pick this without second thought. 田城(0.001%, #6981th) - mozc Japanese imput listed this too. Not that popular names but this names pupolar than 田代