Edit report at https://bugs.php.net/bug.php?id=64740&edit=1
ID: 64740 Updated by: [email protected] Reported by: RQuadling at GMail dot com Summary: Gender ignores country for some names. -Status: Wont fix +Status: Analyzed Type: Bug Package: PECL Operating System: Centos PHP Version: Irrelevant Assigned To: ab Block user comment: N Private report: N New Comment: But that means that were the full name, what I mean it's a nickname, though I'm not a native speaker :) Looking at here http://en.wikipedia.org/wiki/Ben http://en.wikipedia.org/wiki/Daniel_(name) any person I click evaluates to one of Benjamin or Daniel. Is it so, that the short variant can be really given as a full name, like in the birth certificate? There are two terms in the ext - nick name and full name. The difference is that if you type literally 'Dan', the library thinks that you're looking for the full name Dan, not for Daniel. As Dan is recorded as a term of endearment for Daniel. The further handling of course not correct, as for me it should say 'not found' and not try to evaluate all the countries. Why I ask is that the quality of the data is important, if you as a native speaker are sure the shorter variant can be a full name, I should correct the data. Otherwise I'd prefer to implement it the way i've mentioned, checking if that's a nick name, getting the full name and working with it instead. Because there are cases where the same nick name can point to either male or female full name, changing the library behavior in this case were wrong. Thanks. Previous Comments: ------------------------------------------------------------------------ [2013-05-18 20:54:38] RQuadling at GMail dot com The issue for me that even if the name is female in one country, it is a single country that I'm asking for. Dan, in England, _IS_ a male name. And the data does show this. As things stand, supplying the country is redundant. ------------------------------------------------------------------------ [2013-05-18 17:22:05] [email protected] To be more clear, the way i see it should be solved is $g = new Gender\Gender; if($g->isNick($name)) { $name = $g->getNameForNick($name); // to impement } $gender = $g->get($name); That could work as long as there is an unambiguous correlation between the name and the nick. What would you say? ------------------------------------------------------------------------ [2013-05-18 17:03:27] [email protected] Richard, after some research I come to the conclusion this being not a bug. Strictly speaking, both Dan and Ben aren't names but nicknames, respectively for Daniel and Benjamin. Looking into the data there are two corresponding lines = Ben Benjamin 1 1 = Dan Daniel 111 1 That means in both cases it is male in Britain. However because the exact input was the nickname, not the real name, the library looks and evaluates the literally given input and compares the frequencies in all the other countries. I think a change in this area could break the data integrity for the normal operations. Where Ben can evaluate to Benjamin, it could also to Benedict. I think more like about adding some new method like Gender::getRealName($nickname) or adding some options to the existing get method. Changing that behaviour globally might break other more complicated cases. Any ideas? Thanks ) ------------------------------------------------------------------------ [2013-04-30 08:37:50] RQuadling at GMail dot com Description: ------------ The Gender extension ignores the country when the requested name is male or female (trying not to mention the word uni- es ee ex as I think this is causing SPAM alert when posting bugs) in a country where the name is known, even though it is male in the requested country. Also had to post this as a PECL bug as there is no PECL/Gender entry to choose. Test script: --------------- <?php $o_Gender = new Gender\Gender; $o_Gender->trace(); var_dump($o_Gender->get('Ben', Gender\Gender::BRITAIN)); // var_dump($o_Gender->get('Dan', Gender\Gender::BRITAIN)); // var_dump($o_Gender->get('Richard', Gender\Gender::BRITAIN)); Expected result: ---------------- Searching for name 'Ben' (country = Great Britain) Range = line 1 - 48891, guess = 24446 ('Kyung+Ju') Range = line 1 - 24445, guess = 12223 ('Esben') Range = line 1 - 12222, guess = 6111 ('Brendon') Range = line 1 - 6110, guess = 3055 ('Aranita') Range = line 3056 - 6110, guess = 4583 ('Barak') Range = line 4584 - 6110, guess = 5347 ('Beybala') Range = line 4584 - 5346, guess = 4965 ('Benet') Range = line 4584 - 4964, guess = 4774 ('Bavani') Range = line 4775 - 4964, guess = 4869 ('Behudin') Range = line 4870 - 4964, guess = 4917 ('Belk<i>s') Range = line 4918 - 4964, guess = 4941 ('Bendina') Range = line 4918 - 4940, guess = 4929 ('Belu<sch>e') Range = line 4930 - 4940, guess = 4935 ('Benan') Range = line 4930 - 4934, guess = 4932 ('Belva') Range = line 4933 - 4934, guess = 4933 ('Ben') Result: name 'Ben' found evaluating name 'Ben': 'is male' (country = Great Britain[3] or Ireland[1] or U.S.A.[3] or Belgium[4] or the Netherlands[7]) evaluating name 'Ben': 'is uni*** name' (country = China[3]) result for 'Ben': 'is male name' int(77) Searching for name 'Dan' (country = Great Britain) Range = line 1 - 48891, guess = 24446 ('Kyung+Ju') Range = line 1 - 24445, guess = 12223 ('Esben') Range = line 1 - 12222, guess = 6111 ('Brendon') Range = line 6112 - 12222, guess = 9167 ('Delfa') Range = line 6112 - 9166, guess = 7639 ('Chrysostomia') Range = line 7640 - 9166, guess = 8403 ('Curzio') Range = line 8404 - 9166, guess = 8785 ('Danu?e') Range = line 8404 - 8784, guess = 8594 ('Dalbir') Range = line 8595 - 8784, guess = 8689 ('Dan Daniel') Range = line 8595 - 8688, guess = 8641 ('Dalva') Range = line 8642 - 8688, guess = 8665 ('Damion') Range = line 8666 - 8688, guess = 8677 ('Dan') Range = line 8666 - 8677, guess = 8671 ('Damnjan') Range = line 8672 - 8677, guess = 8674 ('Damyan') Range = line 8675 - 8677, guess = 8676 ('Dan') Range = line 8675 - 8676, guess = 8675 ('Damyanti') Range = line 8676 - 8676, guess = 8676 ('Dan') Result: name 'Dan' found evaluating name 'Dan': 'is male' (country = Great Britain[2] or Ireland[3] or U.S.A.[4] or Belgium[1] or Luxembourg[4] or the Netherlands[1] or Swiss[1] or Denmark[6] or Norway[2] or Sweden[6] or Finland[3] or Romania[8] or Moldova[6] or Israel[7]) evaluating name 'Dan': 'is mostly male' (country = Vietnam[6]) evaluating name 'Dan': 'is uni*** name' (country = China[7]) result for 'Dan': 'is male name' int(77) Actual result: -------------- Searching for name 'Ben' (country = Great Britain) Range = line 1 - 48891, guess = 24446 ('Kyung+Ju') Range = line 1 - 24445, guess = 12223 ('Esben') Range = line 1 - 12222, guess = 6111 ('Brendon') Range = line 1 - 6110, guess = 3055 ('Aranita') Range = line 3056 - 6110, guess = 4583 ('Barak') Range = line 4584 - 6110, guess = 5347 ('Beybala') Range = line 4584 - 5346, guess = 4965 ('Benet') Range = line 4584 - 4964, guess = 4774 ('Bavani') Range = line 4775 - 4964, guess = 4869 ('Behudin') Range = line 4870 - 4964, guess = 4917 ('Belk<i>s') Range = line 4918 - 4964, guess = 4941 ('Bendina') Range = line 4918 - 4940, guess = 4929 ('Belu<sch>e') Range = line 4930 - 4940, guess = 4935 ('Benan') Range = line 4930 - 4934, guess = 4932 ('Belva') Range = line 4933 - 4934, guess = 4933 ('Ben') Result: name 'Ben' found evaluating name 'Ben': 'is male' (country = Great Britain[3] or Ireland[1] or U.S.A.[3] or Belgium[4] or the Netherlands[7]) evaluating name 'Ben': 'is uni*** name' (country = China[3]) result for 'Ben': 'is uni*** name' int(63) Searching for name 'Dan' (country = Great Britain) Range = line 1 - 48891, guess = 24446 ('Kyung+Ju') Range = line 1 - 24445, guess = 12223 ('Esben') Range = line 1 - 12222, guess = 6111 ('Brendon') Range = line 6112 - 12222, guess = 9167 ('Delfa') Range = line 6112 - 9166, guess = 7639 ('Chrysostomia') Range = line 7640 - 9166, guess = 8403 ('Curzio') Range = line 8404 - 9166, guess = 8785 ('Danu?e') Range = line 8404 - 8784, guess = 8594 ('Dalbir') Range = line 8595 - 8784, guess = 8689 ('Dan Daniel') Range = line 8595 - 8688, guess = 8641 ('Dalva') Range = line 8642 - 8688, guess = 8665 ('Damion') Range = line 8666 - 8688, guess = 8677 ('Dan') Range = line 8666 - 8677, guess = 8671 ('Damnjan') Range = line 8672 - 8677, guess = 8674 ('Damyan') Range = line 8675 - 8677, guess = 8676 ('Dan') Range = line 8675 - 8676, guess = 8675 ('Damyanti') Range = line 8676 - 8676, guess = 8676 ('Dan') Result: name 'Dan' found evaluating name 'Dan': 'is male' (country = Great Britain[2] or Ireland[3] or U.S.A.[4] or Belgium[1] or Luxembourg[4] or the Netherlands[1] or Swiss[1] or Denmark[6] or Norway[2] or Sweden[6] or Finland[3] or Romania[8] or Moldova[6] or Israel[7]) evaluating name 'Dan': 'is mostly male' (country = Vietnam[6]) evaluating name 'Dan': 'is uni*** name' (country = China[7]) result for 'Dan': 'is uni*** name' int(63) ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=64740&edit=1 -- PECL development discussion Mailing List (http://pecl.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
