Edit report at https://bugs.php.net/bug.php?id=64740&edit=1

 ID:                 64740
 User updated by:    RQuadling at GMail dot com
 Reported by:        RQuadling at GMail dot com
 Summary:            Gender ignores country for some names.
 Status:             Analyzed
 Type:               Bug
 Package:            PECL
 Operating System:   Centos
 PHP Version:        Irrelevant
 Assigned To:        ab
 Block user comment: N
 Private report:     N

 New Comment:

But even if the name is a nickname, it is still a MALE nickname in England. The 
country option isn't doing it's job.


Previous Comments:
------------------------------------------------------------------------
[2013-05-18 21:45:54] [email protected]

But that means that were the full name, what I mean it's a nickname, though I'm 
not a native speaker :) Looking at here

http://en.wikipedia.org/wiki/Ben
http://en.wikipedia.org/wiki/Daniel_(name)

any person I click evaluates to one of Benjamin or Daniel. Is it so, that the 
short variant can be really given as a full name, like in the birth certificate?

There are two terms in the ext - nick name and full name. The difference is 
that 
if you type literally 'Dan', the library thinks that you're looking for the 
full 
name Dan, not for Daniel. As Dan is recorded as a term of endearment for 
Daniel. 
The further handling of course not correct, as for me it should say 'not found' 
and not try to evaluate all the countries.

Why I ask is that the quality of the data is important, if you as a native 
speaker are sure the shorter variant can be a full name, I should correct the 
data. Otherwise I'd prefer to implement it the way i've mentioned, checking if 
that's a nick name, getting the full name and working with it instead. Because 
there are cases where the same nick name can point to either male or female 
full 
name, changing the library behavior in this case were wrong.

Thanks.

------------------------------------------------------------------------
[2013-05-18 20:54:38] RQuadling at GMail dot com

The issue for me that even if the name is female in one country, it is a single 
country that I'm asking for. Dan, in England, _IS_ a male name. And the data 
does 
show this.

As things stand, supplying the country is redundant.

------------------------------------------------------------------------
[2013-05-18 17:22:05] [email protected]

To be more clear, the way i see it should be solved is

$g = new Gender\Gender;
if($g->isNick($name)) {
    $name = $g->getNameForNick($name); // to impement
}
$gender = $g->get($name);

That could work as long as there is an unambiguous correlation between the name 
and the nick. What would you say?

------------------------------------------------------------------------
[2013-05-18 17:03:27] [email protected]

Richard,

after some research I come to the conclusion this being not a bug.

Strictly speaking, both Dan and Ben aren't names but nicknames, respectively 
for Daniel and 
Benjamin. Looking into the data there are two corresponding lines

=  Ben Benjamin               1 1
=  Dan Daniel                 111           1

That means in both cases it is male in Britain. However because the exact input 
was the 
nickname, not the real name, the library looks and evaluates the literally 
given input and 
compares the frequencies in all the other countries.

I think a change in this area could break the data integrity for the normal 
operations. Where  
Ben can evaluate to Benjamin, it could also to Benedict. I think more like 
about adding some 
new method like Gender::getRealName($nickname) or adding some options to the 
existing get 
method. Changing that behaviour globally might break other more complicated 
cases.

Any ideas?

Thanks )

------------------------------------------------------------------------
[2013-04-30 08:37:50] RQuadling at GMail dot com

Description:
------------
The Gender extension ignores the country when the requested name is male or 
female 
(trying not to mention the word uni- es ee ex as I think this is causing SPAM 
alert when posting bugs) in a country where the name is known, even though it 
is 
male in the requested country.

Also had to post this as a PECL bug as there is no PECL/Gender entry to choose.

Test script:
---------------
<?php
$o_Gender = new Gender\Gender;
$o_Gender->trace();
var_dump($o_Gender->get('Ben', Gender\Gender::BRITAIN));
// var_dump($o_Gender->get('Dan', Gender\Gender::BRITAIN));
// var_dump($o_Gender->get('Richard', Gender\Gender::BRITAIN));


Expected result:
----------------
Searching for name 'Ben'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 1 - 6110,  guess = 3055 ('Aranita')
Range = line 3056 - 6110,  guess = 4583 ('Barak')
Range = line 4584 - 6110,  guess = 5347 ('Beybala')
Range = line 4584 - 5346,  guess = 4965 ('Benet')
Range = line 4584 - 4964,  guess = 4774 ('Bavani')
Range = line 4775 - 4964,  guess = 4869 ('Behudin')
Range = line 4870 - 4964,  guess = 4917 ('Belk<i>s')
Range = line 4918 - 4964,  guess = 4941 ('Bendina')
Range = line 4918 - 4940,  guess = 4929 ('Belu<sch>e')
Range = line 4930 - 4940,  guess = 4935 ('Benan')
Range = line 4930 - 4934,  guess = 4932 ('Belva')
Range = line 4933 - 4934,  guess = 4933 ('Ben')
Result: name 'Ben' found
evaluating name 'Ben':  'is male'  (country = Great Britain[3] or Ireland[1] or 
U.S.A.[3] or Belgium[4] or the Netherlands[7])
evaluating name 'Ben':  'is uni*** name'  (country = China[3])
result for 'Ben':  'is male name'
int(77)

Searching for name 'Dan'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 6112 - 12222,  guess = 9167 ('Delfa')
Range = line 6112 - 9166,  guess = 7639 ('Chrysostomia')
Range = line 7640 - 9166,  guess = 8403 ('Curzio')
Range = line 8404 - 9166,  guess = 8785 ('Danu?e')
Range = line 8404 - 8784,  guess = 8594 ('Dalbir')
Range = line 8595 - 8784,  guess = 8689 ('Dan Daniel')
Range = line 8595 - 8688,  guess = 8641 ('Dalva')
Range = line 8642 - 8688,  guess = 8665 ('Damion')
Range = line 8666 - 8688,  guess = 8677 ('Dan')
Range = line 8666 - 8677,  guess = 8671 ('Damnjan')
Range = line 8672 - 8677,  guess = 8674 ('Damyan')
Range = line 8675 - 8677,  guess = 8676 ('Dan')
Range = line 8675 - 8676,  guess = 8675 ('Damyanti')
Range = line 8676 - 8676,  guess = 8676 ('Dan')
Result: name 'Dan' found
evaluating name 'Dan':  'is male'  (country = Great Britain[2] or Ireland[3] or 
U.S.A.[4] or Belgium[1] or Luxembourg[4] or the Netherlands[1] or Swiss[1] or 
Denmark[6] or Norway[2] or Sweden[6] or Finland[3] or Romania[8] or Moldova[6] 
or Israel[7])
evaluating name 'Dan':  'is mostly male'  (country = Vietnam[6])
evaluating name 'Dan':  'is uni*** name'  (country = China[7])
result for 'Dan':  'is male name'
int(77)

Actual result:
--------------
Searching for name 'Ben'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 1 - 6110,  guess = 3055 ('Aranita')
Range = line 3056 - 6110,  guess = 4583 ('Barak')
Range = line 4584 - 6110,  guess = 5347 ('Beybala')
Range = line 4584 - 5346,  guess = 4965 ('Benet')
Range = line 4584 - 4964,  guess = 4774 ('Bavani')
Range = line 4775 - 4964,  guess = 4869 ('Behudin')
Range = line 4870 - 4964,  guess = 4917 ('Belk<i>s')
Range = line 4918 - 4964,  guess = 4941 ('Bendina')
Range = line 4918 - 4940,  guess = 4929 ('Belu<sch>e')
Range = line 4930 - 4940,  guess = 4935 ('Benan')
Range = line 4930 - 4934,  guess = 4932 ('Belva')
Range = line 4933 - 4934,  guess = 4933 ('Ben')
Result: name 'Ben' found
evaluating name 'Ben':  'is male'  (country = Great Britain[3] or Ireland[1] or 
U.S.A.[3] or Belgium[4] or the Netherlands[7])
evaluating name 'Ben':  'is uni*** name'  (country = China[3])
result for 'Ben':  'is uni*** name'
int(63)

Searching for name 'Dan'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 6112 - 12222,  guess = 9167 ('Delfa')
Range = line 6112 - 9166,  guess = 7639 ('Chrysostomia')
Range = line 7640 - 9166,  guess = 8403 ('Curzio')
Range = line 8404 - 9166,  guess = 8785 ('Danu?e')
Range = line 8404 - 8784,  guess = 8594 ('Dalbir')
Range = line 8595 - 8784,  guess = 8689 ('Dan Daniel')
Range = line 8595 - 8688,  guess = 8641 ('Dalva')
Range = line 8642 - 8688,  guess = 8665 ('Damion')
Range = line 8666 - 8688,  guess = 8677 ('Dan')
Range = line 8666 - 8677,  guess = 8671 ('Damnjan')
Range = line 8672 - 8677,  guess = 8674 ('Damyan')
Range = line 8675 - 8677,  guess = 8676 ('Dan')
Range = line 8675 - 8676,  guess = 8675 ('Damyanti')
Range = line 8676 - 8676,  guess = 8676 ('Dan')
Result: name 'Dan' found
evaluating name 'Dan':  'is male'  (country = Great Britain[2] or Ireland[3] or 
U.S.A.[4] or Belgium[1] or Luxembourg[4] or the Netherlands[1] or Swiss[1] or 
Denmark[6] or Norway[2] or Sweden[6] or Finland[3] or Romania[8] or Moldova[6] 
or Israel[7])
evaluating name 'Dan':  'is mostly male'  (country = Vietnam[6])
evaluating name 'Dan':  'is uni*** name'  (country = China[7])
result for 'Dan':  'is uni*** name'
int(63)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=64740&edit=1

-- 
PECL development discussion Mailing List (http://pecl.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to