Javed,
Your explanation allows many other ways to look at the problem.
Some of them skip steps and get to the point faster. Of course, I do not know
what exactly you mean by the "fairness object" other than guessing it does an
evaluation of what you supply and lets you know if it is fair.
For something categorical like gender it used to be easy to use the table()
function to show how many of each category you have. Of course, it now seems
that old assumptions about two genders are being replaced by additional choices
so it may literally be nonbinary.
Your code looked for 'T14' which gives no clue about purpose. Here is an
example where I coded the words "male" and "female" in a small sample for
illustration. You can leave the data as is and have it automatically count or
take percentages and then extract whatever you want and use it to make
decisions.
The darn HTML stripper this list uses makes showing code hard, so I have to
disperse it with extra spacing.
Here is some data:
gender <- c("male", "female", "female", "male", "female", "female", "female")
I made it lopsided and you can see the counts easily enough with:
tab.cnt <- table(gender)
The output is:
> tab.cnt
genderfemale male 5 2
You can of course get percentages using the table object:
tab.prcnt <- prop.table(tab.cnt)
The output is:
> tab.prcnt
gender female male 0.7142857 0.2857143
You can, of course, multiply the above by a hundred and use round() to trim it
to fewer digits, but what you can do is extract the numbers to do things like a
comparison:
Consider deciding that more than 60% females is too much:
if (tab.prcnt[["female"]] > 0.6) print("too many women")
Your criteria may of course be more complicated, but the thing I am teaching is
that there are built-in methods that may be used as you get to know not only
the language but techniques that work well with it. Your need may work well
with your technique of converting your data representation from one form to a
numeric form. Realistically, many might simply use another built-in feature
called factors. Converting my data to a factor does this:
> fact <- factor(gender)> fact[1] male female female male female female
> femaleLevels: female male> as.numeric(fact)[1] 2 1 1 2 1 1 1
The default is to use integers starting with 1 but you can change that in many
ways, or in the above, simply subtract 1 to get what you want. To get the
percentage of men in the above, can be something like this:
> mean(as.numeric(fact) - 1)[1] 0.2857143
You may get lots of advice on many methods and ways to do things but pick what
fits your situation and sometimes you can try to change the situation. For some
purposes, categorical data needs to be transformed for proper use in something
like machine learning algorithms but sometimes it can be left alone as shown
above and the statistics can be worked with.
From: javed khan <[email protected]>
To: Avi Gross <[email protected]>
Cc: [email protected] <[email protected]>
Sent: Fri, Jan 28, 2022 8:34 am
Subject: Re: Error in if (fraction <= 1) { : missing value where TRUE/FALSE
needed
Avi Gross, thanks for your reply.
I have no interest of using the zero and one in my code, I mean true false can
also be ok because I don't have to do some arithmetic with it.
I just want to pass a protected variable and one of its (privileged) value to
the fairness object to see if the model has any bias towards the unprivileged
values of the protected variable.
You can consider my protected variable as Sex and it's values as male and
female. I want the fairness object to see if there is any bias towards the
female group which could be considered as an unprivileged group.
Thanks
On Thursday, January 27, 2022, Avi Gross via R-help <[email protected]>
wrote:
Javed,
You may misunderstand something here.
Forget ifelse() which does all kinds of things (which you can see by just
typing "ifelse" and a carriage/return or ENTER.
Your initial goal should be kept in mind. You want to create a data structure,
in this case a vector, that is the same length as another vector called
test$operator in which you mark whether the corresponding element was exactly
"T13" or not.
There is nothing fundamentally wrong with your approach albeit it is overkill
in this case. As has been pointed out, SKIPPING ifelse() entirely, you can get
a vector of Logicals (TRUE or FALSE) by a simple command like this:
result <- test$operator == 'T13'
For many purposes, that is all you need. TRUE and FALSE are also sometimes
mapped into 1 and 0 for various purposes, so you can convert them into integers
or general numerics is that is needed. Consider the following code that checks
the integers from 1 to 7 to see if they are even (as in divisible by 2):
> result <- 1:7 %% 2 == 0> result[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE>
> as.integer(result)[1] 0 1 0 1 0 1 0> as.numeric(result)[1] 0 1 0 1 0 1 0>
> result <- as.integer(1:7 %% 2 == 0)> result[1] 0 1 0 1 0 1 0
If for some reason the choice of 1 and 0 is the opposite of what you need, you
can invert them several ways with the simplest being:
as.integer(1:7 %% 2 != 0)
or as.integer(!(1:7 %% 2 != 0))
The first negates the comparison and the second just flips every FALSE and TRUE
to the other.
Why are we talking about this? For many more interesting cases, ifelse() is
great as you can replace one or both of the choices with anything. A very
common case is replacing one choice with itself and changing the other, or
nesting the comparisons in a sort of simulated tree as in
ifelse(some_condition, ifelse(second_condition, result1, result2),
ifelse(third_condition, result3, result4)))
But you seem to want the simplest return of two values that also happen to be
the underlying equivalent of TRUE and FALSE in many languages. In Python,
anything that evaluates to zero (or the Boolean value FALSE) tends to be
treated as FALSE, and anything else like a 1 or 666 is treated as TRUE, as
shown below:
> if (TRUE) print("TRUE") else print("FALSE")[1] "TRUE"> if (1) print("TRUE")
> else print("FALSE")[1] "TRUE"> if (666) print("TRUE") else print("FALSE")[1]
> "TRUE"> if (FALSE) print("TRUE") else print("FALSE")[1] "FALSE"> if (0)
> print("TRUE") else print("FALSE")[1] "FALSE"
This is why you are being told that for many purposes, the Boolean vector may
work fine. But if you really want or need zero and one, that is a trivial
transformation as shown. Feel free to use ifelse() and then figure out what
went wrong with your code, but also to try the simpler version and see if the
problem goes away.
Avi
-----Original Message-----
From: javed khan <[email protected]>
To: Bert Gunter <[email protected]>
Cc: R-help <[email protected]>
Sent: Thu, Jan 27, 2022 1:15 pm
Subject: Re: [R] Error in if (fraction <= 1) { : missing value where TRUE/FALSE
needed
Thank you Bert Gunter
Do you mean I should do something like this:
prot <- (as.numeric(ifelse(test$ operator == 'T13', 1, 0))
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.