from:"Aleksander Główka"

[R] identifying convergence or non-convergence of mixed-effects regression model in lme4 from model output

2017-12-25 Thread Aleksander Główka

 ..@ Dim : int [1:2] 10 10
   .. ..@ Dimnames:List of 2
   .. .. ..$ : chr [1:10] "(Intercept)" "FreqABCD.log.std" 
"LogitABCD.neg.log.std" "MIABCD.neg.log.std" ...
   .. .. ..$ : chr [1:10] "(Intercept)" "FreqABCD.log.std" 
"LogitABCD.neg.log.std" "MIABCD.neg.log.std" ...
   .. ..@ uplo    : chr "U"
   .. ..@ factors :List of 1
   .. .. ..$ correlation:Formal class 'corMatrix' [package "Matrix"] 
with 6 slots
   .. .. .. .. ..@ sd  : num [1:10] 0.0339 0.0519 0.013 0.0439 
0.0068 ...
   .. .. .. .. ..@ x   : num [1:100] 1 0.0194 -0.1162 0.0147 0.0158 ...
   .. .. .. .. ..@ Dim : int [1:2] 10 10
   .. .. .. .. ..@ Dimnames:List of 2
   .. .. .. .. .. ..$ : chr [1:10] "(Intercept)" "FreqABCD.log.std" 
"LogitABCD.neg.log.std" "MIABCD.neg.log.std" ...
   .. .. .. .. .. ..$ : chr [1:10] "(Intercept)" "FreqABCD.log.std" 
"LogitABCD.neg.log.std" "MIABCD.neg.log.std" ...
   .. .. .. .. ..@ uplo    : chr "U"
   .. .. .. .. ..@ factors :List of 1
   .. .. .. .. .. ..$ Cholesky:Formal class 'Cholesky' [package 
"Matrix"] with 5 slots
   .. .. .. .. .. .. .. ..@ x   : num [1:100] 1 0 0 0 0 0 0 0 0 0 ...
   .. .. .. .. .. .. .. ..@ Dim : int [1:2] 10 10
   .. .. .. .. .. .. .. ..@ Dimnames:List of 2
   .. .. .. .. .. .. .. .. ..$ : NULL
   .. .. .. .. .. .. .. .. ..$ : NULL
   .. .. .. .. .. .. .. ..@ uplo    : chr "U"
   .. .. .. .. .. .. .. ..@ diag    : chr "N"
  $ varcor  :List of 2
   ..$ subj: num [1, 1] 0.0273
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : chr "(Intercept)"
   .. .. ..$ : chr "(Intercept)"
   .. ..- attr(*, "stddev")= Named num 0.165
   .. .. ..- attr(*, "names")= chr "(Intercept)"
   .. ..- attr(*, "correlation")= num [1, 1] 1
   .. .. ..- attr(*, "dimnames")=List of 2
   .. .. .. ..$ : chr "(Intercept)"
   .. .. .. ..$ : chr "(Intercept)"
   ..$ item: num [1:2, 1:2] 0.00417 0.000484 0.000484 0.00289
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : chr [1:2] "(Intercept)" "FreqABCD.log.std"
   .. .. ..$ : chr [1:2] "(Intercept)" "FreqABCD.log.std"
   .. ..- attr(*, "stddev")= Named num [1:2] 0.0646 0.0538
   .. .. ..- attr(*, "names")= chr [1:2] "(Intercept)" "FreqABCD.log.std"
   .. ..- attr(*, "correlation")= num [1:2, 1:2] 1 0.139 0.139 1
   .. .. ..- attr(*, "dimnames")=List of 2
   .. .. .. ..$ : chr [1:2] "(Intercept)" "FreqABCD.log.std"
   .. .. .. ..$ : chr [1:2] "(Intercept)" "FreqABCD.log.std"
   ..- attr(*, "sc")= num 0.239
   ..- attr(*, "useSc")= logi TRUE
   ..- attr(*, "class")= chr "VarCorr.merMod"
  $ AICtab  : Named num [1:5] 159.7 241.6 -64.8 129.7 1727
   ..- attr(*, "names")= chr [1:5] "AIC" "BIC" "logLik" "deviance" ...
  $ call    : language lme4::lmer(formula = RT.log ~ 
FreqABCD.log.std + LogitABCD.neg.log.std + MIABCD.neg.log.std + 
AS.data$freq.sub.PC1 +  AS.data$freq.sub.PC2 + AS.data$freq.sub.PC3 
+ AS.data$freq.sub.PC4 + block + nletter.std + (1 | subj) +  ...
  $ residuals   : Named num [1:1742] 0.713 0.498 -0.361 -0.101 2.594 ...
   ..- attr(*, "names")= chr [1:1742] "1" "2" "3" "4" ...
  $ fitMsgs : chr(0)
  $ optinfo :List of 7
   ..$ optimizer: chr "bobyqa"
   ..$ control  :List of 1
   .. ..$ iprint: int 0
   ..$ derivs   :List of 2
   .. ..$ gradient: num [1:4] 9.81e-06 -5.34e-06 -1.60e-05 7.06e-05
   .. ..$ Hessian : num [1:4, 1:4] 245.9 28.5 3.3 -13.7 28.5 ...
   ..$ conv :List of 2
   .. ..$ opt : int 0
   .. ..$ lme4: list()
   ..$ feval    : int 107
   ..$ warnings : list()
   ..$ val  : num [1:4] 0.6919 0.2705 0.0314 0.223
  - attr(*, "class")= chr "summary.merMod"

I'd appreciate any advice you may have!

Thank you,

Aleksander Główka
PhD Candidate
Department of Linguistics
Stanford University
**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bootstrap subject resampling: resampled subject codes surface as list/vector indices

2017-08-19 Thread Aleksander Główka

Thank you and apologies for not having posted the data along with the code.

After poking some more, I found the bug.

I first initialize sample.subjects as an an empty list:

sample.subjects = list()

And then I try to the first element of that empty list.

sample.subjects[1] = sample(unique(data$subj), 1, replace=TRUE,prob=NULL)

Needless to say, an empty list has no elements.

After changing this last line to:

sample.subjects = sample(unique(data$subj), 1, replace=TRUE,prob=NULL)

the code runs without issues. I actually don't need the initialization line. It 
only caused unnecessary confusion.

Thank you!

On 8/19/2017 7:15 PM, Bert Gunter wrote:

I din't have the patience to go through your missive in detail, but do
note that it is not reproducible, as you have not provided a "data"
object. You **are** asked to provide a small reproducible example by
the posting guide.

Of course, others with more patience and/or more smarts may not need
the reprex to figure out what's going on. But if not ...

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Aug 19, 2017 at 7:39 AM, Aleksander Główka  wrote:

I'm implementing a custom bootstrap resampling procedure in R. This
procedure resamples clusters of data points obtained by different subjects
in an experiment. Since the bootstrap samples need to have the same size as
the original dataset, `target.set.size`, I select speakers compute their
data point contributions to make sure I have a set of the right size.

 set.seed(1)
 target.sample.size = 1742
 count.lookup = rbind(levels(data$subj), as.numeric(table(data$subj)))

To this end, I create a dynamic list of resampled subjects,
`sample.subjects`, that keep on being selected and appended to the list as
long as their summed data point contributions do not exceed
`target.set.size`. To conveniently retrieve the number of data points that a
given subject contributes I constructed a reference matrix, `count.lookup`,
where the first row contains subject codes and the second row contains their
respective data point counts.

 > count.lookup

 [,1]  [,2]  [,3]  [,4]  [,5]
 [1,] "5"   "6"   "13"  "18"  "20"
 [2,] "337" "202" "311" "740" "152"

This is how the resampling works:

 for (iter in 1:1000){

   #select first subject
   #empty list overwrites sample subjects from previous iteration
   sample.subjects = list()
   sample.subjects[1] = sample(unique(data$subj), 1, replace=TRUE,
prob=NULL)

   #determine subject position in data point count lookup
   first.subj.pos = which(count.lookup[1,]==sample.subjects,
arr.ind=TRUE)

   #add contribution of first subject to data point count
   sample.size = as.numeric(count.lookup[2,first.subj.pos])

   #select subject clusters until you exceed target sample size
   while(sample.size < target.sample.size){

 #add another subject
 current.subject = sample(unique(data$subj), 1, replace=TRUE,
prob=NULL)
 sample.subjects[length(sample.subjects)+1] = current.subject

 #determine subject's position in data point lookup
 curr.subj.pos = which(count.lookup[1,]==current.subject,
arr.ind=TRUE)

 #add subject contribution to the data point count
 sample.size = sample.size +
as.numeric(count.lookup[2,curr.subj.pos])
   }

   #initialize intermediate data frame; intermediate because it will be
shortened to fit target size
   inter.set = data.frame(matrix(, nrow = 0, ncol = ncol(data)))

   #build the bootstrap sample from the selected subjects
   for(j in 1:length(sample.subjects)){

 inter.set = rbind(inter.set, data[data$subj == sample.subjects[j],])

   }

   #procustean bed of target sample size
   final.set = inter.set[1:target.sample.size,]

   write.csv(final.set, paste("bootstrap_sample_", iter,".csv", sep=""),
row.names=FALSE)
   cat("Bootstrap Iteration", iter, "completed\n")

   #clean up sample.size for next bootstrap iteration
   sample.size = 0

 }

My problem is that when I sample the second subject onward and add it to
`sample.subjects` (regardless of whether it is a list of a vector), what
actually gets added to `sample.subjects` seems to be the index of that
subject in `count.lookup`! When I select the first subject code and create a
list consisting of just that subject code as the only element, everything is
fine.

 > sample.subjects[1] = sample(unique(tt1$subj), 1, replace=TRUE,
prob=NULL)
 > sample.subjects
 [[1]]
 [1] 5

I know this is the actual subject number because when I check the number

[R] bootstrap subject resampling: resampled subject codes surface as list/vector indices

2017-08-19 Thread Aleksander Główka

I'm implementing a custom bootstrap resampling procedure in R. This 
procedure resamples clusters of data points obtained by different 
subjects in an experiment. Since the bootstrap samples need to have the 
same size as the original dataset, `target.set.size`, I select speakers 
compute their data point contributions to make sure I have a set of the 
right size.


set.seed(1)
target.sample.size = 1742
count.lookup = rbind(levels(data$subj), as.numeric(table(data$subj)))

To this end, I create a dynamic list of resampled subjects, 
`sample.subjects`, that keep on being selected and appended to the list 
as long as their summed data point contributions do not exceed 
`target.set.size`. To conveniently retrieve the number of data points 
that a given subject contributes I constructed a reference matrix, 
`count.lookup`, where the first row contains subject codes and the 
second row contains their respective data point counts.


> count.lookup

[,1]  [,2]  [,3]  [,4]  [,5]
[1,] "5"   "6"   "13"  "18"  "20"
[2,] "337" "202" "311" "740" "152"

This is how the resampling works:

for (iter in 1:1000){

  #select first subject
  #empty list overwrites sample subjects from previous iteration
  sample.subjects = list()
  sample.subjects[1] = sample(unique(data$subj), 1, replace=TRUE, 
prob=NULL)


  #determine subject position in data point count lookup
  first.subj.pos = which(count.lookup[1,]==sample.subjects, 
arr.ind=TRUE)


  #add contribution of first subject to data point count
  sample.size = as.numeric(count.lookup[2,first.subj.pos])

  #select subject clusters until you exceed target sample size
  while(sample.size < target.sample.size){

#add another subject
current.subject = sample(unique(data$subj), 1, replace=TRUE, 
prob=NULL)

sample.subjects[length(sample.subjects)+1] = current.subject

#determine subject's position in data point lookup
curr.subj.pos = which(count.lookup[1,]==current.subject, 
arr.ind=TRUE)


#add subject contribution to the data point count
sample.size = sample.size + 
as.numeric(count.lookup[2,curr.subj.pos])

  }

  #initialize intermediate data frame; intermediate because it will 
be shortened to fit target size

  inter.set = data.frame(matrix(, nrow = 0, ncol = ncol(data)))

  #build the bootstrap sample from the selected subjects
  for(j in 1:length(sample.subjects)){

inter.set = rbind(inter.set, data[data$subj == 
sample.subjects[j],])


  }

  #procustean bed of target sample size
  final.set = inter.set[1:target.sample.size,]

  write.csv(final.set, paste("bootstrap_sample_", iter,".csv", 
sep=""), row.names=FALSE)

  cat("Bootstrap Iteration", iter, "completed\n")

  #clean up sample.size for next bootstrap iteration
  sample.size = 0

}

My problem is that when I sample the second subject onward and add it to 
`sample.subjects` (regardless of whether it is a list of a vector), what 
actually gets added to `sample.subjects` seems to be the index of that 
subject in `count.lookup`! When I select the first subject code and 
create a list consisting of just that subject code as the only element, 
everything is fine.


> sample.subjects[1] = sample(unique(tt1$subj), 1, replace=TRUE, 
prob=NULL)

> sample.subjects
[[1]]
[1] 5

I know this is the actual subject number because when I check the number 
of data points that this subject contributes in `count.lookup`, it is 
the number that corresponds to subject 5.


> sample.size = as.numeric(tt1.lookup[2,first.subj.pos])
> sample.size

However, when I append further sampled subject codes to the list, for 
some reason they surface as their index number in count.lookup.


> sample.subjects
[[1]]
[1] 5

[[2]]
[1] 5

[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 5

[[6]]
[1] 2

[[7]]
[1] 2

[[8]]
[1] 3

[[9]]
[1] 3

The third element, for example, is 1. This coincides with none of the 
subject codes in count.lookup.


It seems the problem lies in how I append to `sample.subjects`. I tried 
both vectors and list as data structures in which to store sampled 
subject codes. For each data type, I tried two ways of appending: the 
one I present above, and one that is more idiomatic in R:


sampled.subjects = [current.subject, sampled.subjects] (for lists)

and

sampled.subjects = c(current.subject, sampled.subjects) (for vectors)

Are these appending strategies flawed here or is there some stupid error 
I'm making somewhere else that is making the indices to surface instead 
of subject codes?


I'd appreciate all your help!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provi

[R] identifying convergence or non-convergence of mixed-effects regression model in lme4 from model output

Re: [R] bootstrap subject resampling: resampled subject codes surface as list/vector indices

[R] bootstrap subject resampling: resampled subject codes surface as list/vector indices

3 matches

Site Navigation

Mail list logo

Footer information