Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-28 Thread Rui Barradas

Hello,

Yes, sorry for my previous post, I had forgotten about boot.array.
That's a much better solution for your problem.

Rui Barradas

Às 20:29 de 28/01/21, Marna Wagley escreveu:

Thank you Rui,
This is great. How about the following?

SimilatedData<-boot.array(b, indices=T)

seems it is giving the rows ID which are used in the calculation, isn't it?




On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas > wrote:


Hello,

I don't know why you would need to see the indices but rewrite the
function bootprop as

bootprop_ind <- function(data, index){
    d <- data[index, ]
    #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
    index
}


and call in the same way. It will now return a matrix of indices with R
= 1000 rows and 19 columns.

Hope this helps,

Rui Barradas


Às 19:29 de 28/01/21, Marna Wagley escreveu:
 > Hi Rui,
 > I am sorry for asking you several questions.
 >
 > In the given example, randomizations (reshuffle) were done 1000
times,
 > and its 1000 proportion values (results) are stored and it can be
seen
 > using b$t; but I was wondering how the table was randomized
(which rows
 > have been missed/or repeated in each randomizing procedure?).
 >
 > Is there any way we can see the randomized table and its associated
 > results? Here in this example, I randomized (or bootstrapped) the
table
 > into three times (R=3) so I would like to store these three
tables and
 > look at them later to know which rows were repeated/missed. Is
there any
 > possibility?
 > The example data and the code is given below.
 >
 > Thank you for your help.
 >
 > 
 > library(boot)
 > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
 > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
= c("id1",
 > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
 > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
 > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
 > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
 > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
 > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
"data.frame",
 > row.names = c(NA,
 > -19L))
 > daT<-data.frame(dat %>%
 >    mutate(Time1.but.not.in.Time2 = case_when(
 >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
 > Time2.but.not.in.Time1 = case_when(
 >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
 >   BothTimes = case_when(
 >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
 > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
 > "BothTimes")
 > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
 > summary(daT)
 >
 > bootprop <- function(data, index){
 >     d <- data[index, ]
 >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
TRUE)
 > }
 >
 > R <- 3
 > set.seed(2020)
 > b <- boot(daT, bootprop, R)
 > b
 > b$t0     # original
 > b$t
 > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
 > hist(b$t, freq = FALSE)
 >
 > str(b)
 > b$data
 > b$seed
 > b$sim
 > b$strata
 > 
 >
 >
 > On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley
mailto:marna.wag...@gmail.com>
 > >>
wrote:
 >
 >     Yes Rui, I can see we don't need to divide by square root of
sample
 >     size. The example is great to understand it.
 >     Thank you.
 >     Marna
 >
 >
 >     On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas
mailto:ruipbarra...@sapo.pt>
 >     >>
wrote:
 >
 >         Hello,
 >
 >         Inline.
 >
 >         Às 07:47 de 23/01/21, Marna Wagley escreveu:
 >          > Dear Rui,
 >          > I was wondering whether we have to square root of SD
to find
 >         SE, right?
 >
 >         No, we don't. var already divides by n, don't divide again.
 >         This is the code, that can be seen by running the
function name
 >         at a
 >         command line.
 >
 >
 >         sd
 >         #function (x, na.rm = FALSE)
 >         #sqrt(var(if (is.vector(x) || is.factor(x)) x else
as.double(x),
 >         #    na.rm = na.rm))
 >         #
 >         #
 >
 >
 >
 >          >
 >          > bootprop <- function(data, index){
 >          >     d <- data[index, ]
 >          >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
 >         na.rm = TRUE)
 >          > }
 >          >
 >  

Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-28 Thread Marna Wagley
Thank you Rui,
This is great. How about the following?

SimilatedData<-boot.array(b, indices=T)

seems it is giving the rows ID which are used in the calculation, isn't it?




On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas  wrote:

> Hello,
>
> I don't know why you would need to see the indices but rewrite the
> function bootprop as
>
> bootprop_ind <- function(data, index){
>d <- data[index, ]
>#sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>index
> }
>
>
> and call in the same way. It will now return a matrix of indices with R
> = 1000 rows and 19 columns.
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 19:29 de 28/01/21, Marna Wagley escreveu:
> > Hi Rui,
> > I am sorry for asking you several questions.
> >
> > In the given example, randomizations (reshuffle) were done 1000 times,
> > and its 1000 proportion values (results) are stored and it can be seen
> > using b$t; but I was wondering how the table was randomized (which rows
> > have been missed/or repeated in each randomizing procedure?).
> >
> > Is there any way we can see the randomized table and its associated
> > results? Here in this example, I randomized (or bootstrapped) the table
> > into three times (R=3) so I would like to store these three tables and
> > look at them later to know which rows were repeated/missed. Is there any
> > possibility?
> > The example data and the code is given below.
> >
> > Thank you for your help.
> >
> > 
> > library(boot)
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
> c("id1",
> > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> >mutate(Time1.but.not.in.Time2 = case_when(
> >  Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> >  Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >   BothTimes = case_when(
> >  Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> >
> > bootprop <- function(data, index){
> > d <- data[index, ]
> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> > }
> >
> > R <- 3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0 # original
> > b$t
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > hist(b$t, freq = FALSE)
> >
> > str(b)
> > b$data
> > b$seed
> > b$sim
> > b$strata
> > 
> >
> >
> > On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley  > > wrote:
> >
> > Yes Rui, I can see we don't need to divide by square root of sample
> > size. The example is great to understand it.
> > Thank you.
> > Marna
> >
> >
> > On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas  > > wrote:
> >
> > Hello,
> >
> > Inline.
> >
> > Às 07:47 de 23/01/21, Marna Wagley escreveu:
> >  > Dear Rui,
> >  > I was wondering whether we have to square root of SD to find
> > SE, right?
> >
> > No, we don't. var already divides by n, don't divide again.
> > This is the code, that can be seen by running the function name
> > at a
> > command line.
> >
> >
> > sd
> > #function (x, na.rm = FALSE)
> > #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
> > #na.rm = na.rm))
> > #
> > #
> >
> >
> >
> >  >
> >  > bootprop <- function(data, index){
> >  > d <- data[index, ]
> >  > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
> > na.rm = TRUE)
> >  > }
> >  >
> >  > R <- 1e3
> >  > set.seed(2020)
> >  > b <- boot(daT, bootprop, R)
> >  > b
> >  > b$t0 # original
> >  > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> >  > sd(b$t)/sqrt(1000)
> >  > pandit*(1-pandit)
> >  >
> >  > hist(b$t, freq = FALSE)
> >
> >
> > Try plotting the normal densities for both cases, the red line is
> > clearly wrong.
> >
> >
> > f <- function(x, xbar, s){
> > dnorm(x, mean = xbar, sd = s)
> > }
> >
> > hist(b$t, freq = FALSE)
> > curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
> > "blue",
> > add = TRUE)
> > curve(f(x, 

Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-28 Thread Rui Barradas

Hello,

I don't know why you would need to see the indices but rewrite the 
function bootprop as


bootprop_ind <- function(data, index){
  d <- data[index, ]
  #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
  index
}


and call in the same way. It will now return a matrix of indices with R 
= 1000 rows and 19 columns.


Hope this helps,

Rui Barradas


Às 19:29 de 28/01/21, Marna Wagley escreveu:

Hi Rui,
I am sorry for asking you several questions.

In the given example, randomizations (reshuffle) were done 1000 times, 
and its 1000 proportion values (results) are stored and it can be seen 
using b$t; but I was wondering how the table was randomized (which rows 
have been missed/or repeated in each randomizing procedure?).


Is there any way we can see the randomized table and its associated 
results? Here in this example, I randomized (or bootstrapped) the table 
into three times (R=3) so I would like to store these three tables and 
look at them later to know which rows were repeated/missed. Is there any 
possibility?

The example data and the code is given below.

Thank you for your help.


library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame", 
row.names = c(NA,

-19L))
daT<-data.frame(dat %>%
   mutate(Time1.but.not.in.Time2 = case_when(
             Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
Time2.but.not.in.Time1 = case_when(
             Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
  BothTimes = case_when(
             Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1", 
"BothTimes")

daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)

bootprop <- function(data, index){
    d <- data[index, ]
    sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0     # original
b$t
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)

str(b)
b$data
b$seed
b$sim
b$strata



On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley > wrote:


Yes Rui, I can see we don't need to divide by square root of sample
size. The example is great to understand it.
Thank you.
Marna


On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas mailto:ruipbarra...@sapo.pt>> wrote:

Hello,

Inline.

Às 07:47 de 23/01/21, Marna Wagley escreveu:
 > Dear Rui,
 > I was wondering whether we have to square root of SD to find
SE, right?

No, we don't. var already divides by n, don't divide again.
This is the code, that can be seen by running the function name
at a
command line.


sd
#function (x, na.rm = FALSE)
#sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
#    na.rm = na.rm))
#
#



 >
 > bootprop <- function(data, index){
 >     d <- data[index, ]
 >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
 > }
 >
 > R <- 1e3
 > set.seed(2020)
 > b <- boot(daT, bootprop, R)
 > b
 > b$t0     # original
 > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
 > sd(b$t)/sqrt(1000)
 > pandit*(1-pandit)
 >
 > hist(b$t, freq = FALSE)


Try plotting the normal densities for both cases, the red line is
clearly wrong.


f <- function(x, xbar, s){
    dnorm(x, mean = xbar, sd = s)
}

hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
"blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
col =
"red", add = TRUE)


Hope this helps,

Rui Barradas

 >
 >
 >
 >
 > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
mailto:ruipbarra...@sapo.pt>
 > >>
wrote:
 >
 >     Hello,
 >
 >     Something like this, using base package boot?
 >
 >
 >     library(boot)
 >
 >     bootprop <- function(data, index){
 >         d <- data[index, ]
 >         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
 >     }
 >
 

Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-28 Thread Marna Wagley
Hi Rui,
I am sorry for asking you several questions.

In the given example, randomizations (reshuffle) were done 1000 times, and
its 1000 proportion values (results) are stored and it can be seen using
b$t; but I was wondering how the table was randomized (which rows have been
missed/or repeated in each randomizing procedure?).

Is there any way we can see the randomized table and its associated
results? Here in this example, I randomized (or bootstrapped) the table
into three times (R=3) so I would like to store these three tables and look
at them later to know which rows were repeated/missed. Is there any
possibility?
The example data and the code is given below.

Thank you for your help.


library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
  mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
 BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)

bootprop <- function(data, index){
   d <- data[index, ]
   sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
b$t
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)

str(b)
b$data
b$seed
b$sim
b$strata



On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley 
wrote:

> Yes Rui, I can see we don't need to divide by square root of sample size.
> The example is great to understand it.
> Thank you.
> Marna
>
>
> On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas 
> wrote:
>
>> Hello,
>>
>> Inline.
>>
>> Às 07:47 de 23/01/21, Marna Wagley escreveu:
>> > Dear Rui,
>> > I was wondering whether we have to square root of SD to find SE, right?
>>
>> No, we don't. var already divides by n, don't divide again.
>> This is the code, that can be seen by running the function name at a
>> command line.
>>
>>
>> sd
>> #function (x, na.rm = FALSE)
>> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
>> #na.rm = na.rm))
>> #
>> #
>>
>>
>>
>> >
>> > bootprop <- function(data, index){
>> > d <- data[index, ]
>> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>> > }
>> >
>> > R <- 1e3
>> > set.seed(2020)
>> > b <- boot(daT, bootprop, R)
>> > b
>> > b$t0 # original
>> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
>> > sd(b$t)/sqrt(1000)
>> > pandit*(1-pandit)
>> >
>> > hist(b$t, freq = FALSE)
>>
>>
>> Try plotting the normal densities for both cases, the red line is
>> clearly wrong.
>>
>>
>> f <- function(x, xbar, s){
>>dnorm(x, mean = xbar, sd = s)
>> }
>>
>> hist(b$t, freq = FALSE)
>> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
>> add = TRUE)
>> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
>> "red", add = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> >
>> >
>> >
>> >
>> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas > > > wrote:
>> >
>> > Hello,
>> >
>> > Something like this, using base package boot?
>> >
>> >
>> > library(boot)
>> >
>> > bootprop <- function(data, index){
>> > d <- data[index, ]
>> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
>> TRUE)
>> > }
>> >
>> > R <- 1e3
>> > set.seed(2020)
>> > b <- boot(daT, bootprop, R)
>> > b
>> > b$t0 # original
>> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
>> > hist(b$t, freq = FALSE)
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> > Às 21:57 de 22/01/21, Marna Wagley escreveu:
>> >  > Hi All,
>> >  > I was trying to estimate standard error (SE) for the proportion
>> > value using
>> >  > some kind of randomization process (bootstrapping or jackknifing)
>> > in R, but
>> >  > I could not figure it out.
>> >  >
>> >  > Is there any way to generate SE for the proportion?
>> >  >
>> >  > The example of the data and the code I am using is attached for
>> your
>> >  > reference. I would like to generate the value of proportion with
>> > a SE using
>> >  > a 1000 times randomization.
>> >  >
>> >  > 

Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-23 Thread Marna Wagley
Yes Rui, I can see we don't need to divide by square root of sample size.
The example is great to understand it.
Thank you.
Marna


On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas  wrote:

> Hello,
>
> Inline.
>
> Às 07:47 de 23/01/21, Marna Wagley escreveu:
> > Dear Rui,
> > I was wondering whether we have to square root of SD to find SE, right?
>
> No, we don't. var already divides by n, don't divide again.
> This is the code, that can be seen by running the function name at a
> command line.
>
>
> sd
> #function (x, na.rm = FALSE)
> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
> #na.rm = na.rm))
> #
> #
>
>
>
> >
> > bootprop <- function(data, index){
> > d <- data[index, ]
> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> > }
> >
> > R <- 1e3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0 # original
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > sd(b$t)/sqrt(1000)
> > pandit*(1-pandit)
> >
> > hist(b$t, freq = FALSE)
>
>
> Try plotting the normal densities for both cases, the red line is
> clearly wrong.
>
>
> f <- function(x, xbar, s){
>dnorm(x, mean = xbar, sd = s)
> }
>
> hist(b$t, freq = FALSE)
> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
> add = TRUE)
> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
> "red", add = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> >
> >
> >
> >
> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas  > > wrote:
> >
> > Hello,
> >
> > Something like this, using base package boot?
> >
> >
> > library(boot)
> >
> > bootprop <- function(data, index){
> > d <- data[index, ]
> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
> TRUE)
> > }
> >
> > R <- 1e3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0 # original
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > hist(b$t, freq = FALSE)
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 21:57 de 22/01/21, Marna Wagley escreveu:
> >  > Hi All,
> >  > I was trying to estimate standard error (SE) for the proportion
> > value using
> >  > some kind of randomization process (bootstrapping or jackknifing)
> > in R, but
> >  > I could not figure it out.
> >  >
> >  > Is there any way to generate SE for the proportion?
> >  >
> >  > The example of the data and the code I am using is attached for
> your
> >  > reference. I would like to generate the value of proportion with
> > a SE using
> >  > a 1000 times randomization.
> >  >
> >  > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L,
> 16L,
> >  > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
> > = c("id1",
> >  > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> >  > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> >  > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> >  > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> >  > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> >  > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
> > "data.frame",
> >  > row.names = c(NA,
> >  > -19L))
> >  > daT<-data.frame(dat %>%
> >  >mutate(Time1.but.not.in.Time2 = case_when(
> >  >  Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> >  > Time2.but.not.in.Time1 = case_when(
> >  >  Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >  >   BothTimes = case_when(
> >  >  Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> >  >   daT
> >  >   summary(daT)
> >  >
> >  > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> >  > "BothTimes")
> >  > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> >  > summary(daT)
> >  > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1,
> na.rm=T)
> >  > ProportionValue
> >  > standard error??
> >  >
> >  >   [[alternative HTML version deleted]]
> >  >
> >  > __
> >  > R-help@r-project.org  mailing list
> > -- To UNSUBSCRIBE and more, see
> >  > https://stat.ethz.ch/mailman/listinfo/r-help
> >  > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> >  > and provide commented, minimal, self-contained, reproducible code.
> >  >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-23 Thread Rui Barradas

Hello,

Inline.

Às 07:47 de 23/01/21, Marna Wagley escreveu:

Dear Rui,
I was wondering whether we have to square root of SD to find SE, right?


No, we don't. var already divides by n, don't divide again.
This is the code, that can be seen by running the function name at a 
command line.



sd
#function (x, na.rm = FALSE)
#sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
#na.rm = na.rm))
#
#





bootprop <- function(data, index){
    d <- data[index, ]
    sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0     # original
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
sd(b$t)/sqrt(1000)
pandit*(1-pandit)

hist(b$t, freq = FALSE)



Try plotting the normal densities for both cases, the red line is 
clearly wrong.



f <- function(x, xbar, s){
  dnorm(x, mean = xbar, sd = s)
}

hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue", 
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col = 
"red", add = TRUE)



Hope this helps,

Rui Barradas






On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas > wrote:


Hello,

Something like this, using base package boot?


library(boot)

bootprop <- function(data, index){
    d <- data[index, ]
    sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0     # original
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)


Hope this helps,

Rui Barradas

Às 21:57 de 22/01/21, Marna Wagley escreveu:
 > Hi All,
 > I was trying to estimate standard error (SE) for the proportion
value using
 > some kind of randomization process (bootstrapping or jackknifing)
in R, but
 > I could not figure it out.
 >
 > Is there any way to generate SE for the proportion?
 >
 > The example of the data and the code I am using is attached for your
 > reference. I would like to generate the value of proportion with
a SE using
 > a 1000 times randomization.
 >
 > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
 > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
= c("id1",
 > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
 > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
 > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
 > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
 > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
 > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
"data.frame",
 > row.names = c(NA,
 > -19L))
 > daT<-data.frame(dat %>%
 >    mutate(Time1.but.not.in.Time2 = case_when(
 >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
 > Time2.but.not.in.Time1 = case_when(
 >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
 >   BothTimes = case_when(
 >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
 >   daT
 >   summary(daT)
 >
 > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
 > "BothTimes")
 > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
 > summary(daT)
 > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
 > ProportionValue
 > standard error??
 >
 >       [[alternative HTML version deleted]]
 >
 > __
 > R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.
 >



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-22 Thread Marna Wagley
Dear Rui,
I was wondering whether we have to square root of SD to find SE, right?

bootprop <- function(data, index){
   d <- data[index, ]
   sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
sd(b$t)/sqrt(1000)
pandit*(1-pandit)

hist(b$t, freq = FALSE)




On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas  wrote:

> Hello,
>
> Something like this, using base package boot?
>
>
> library(boot)
>
> bootprop <- function(data, index){
>d <- data[index, ]
>sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0 # original
> sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> hist(b$t, freq = FALSE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 21:57 de 22/01/21, Marna Wagley escreveu:
> > Hi All,
> > I was trying to estimate standard error (SE) for the proportion value
> using
> > some kind of randomization process (bootstrapping or jackknifing) in R,
> but
> > I could not figure it out.
> >
> > Is there any way to generate SE for the proportion?
> >
> > The example of the data and the code I am using is attached for your
> > reference. I would like to generate the value of proportion with a SE
> using
> > a 1000 times randomization.
> >
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
> c("id1",
> > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> >mutate(Time1.but.not.in.Time2 = case_when(
> >  Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> >  Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >   BothTimes = case_when(
> >  Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> >   daT
> >   summary(daT)
> >
> > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
> > ProportionValue
> > standard error??
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to generate SE for the proportion value using a randomization process in R?

2021-01-22 Thread Rui Barradas

Hello,

Something like this, using base package boot?


library(boot)

bootprop <- function(data, index){
  d <- data[index, ]
  sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)


Hope this helps,

Rui Barradas

Às 21:57 de 22/01/21, Marna Wagley escreveu:

Hi All,
I was trying to estimate standard error (SE) for the proportion value using
some kind of randomization process (bootstrapping or jackknifing) in R, but
I could not figure it out.

Is there any way to generate SE for the proportion?

The example of the data and the code I am using is attached for your
reference. I would like to generate the value of proportion with a SE using
a 1000 times randomization.

dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
   mutate(Time1.but.not.in.Time2 = case_when(
 Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
Time2.but.not.in.Time1 = case_when(
 Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
  BothTimes = case_when(
 Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
  daT
  summary(daT)

cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
ProportionValue
standard error??

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.