Re: [R] Conditional looping over a set of variables in R

2010-10-27 Thread David Herzberg
Peter, thanks for this elegant solution that works well and handles the empty 
cases. However, the vector it returns includes both the row (case) numbers and 
the target result (number of column of first 1). How can I strip out the row 
numbers and leave only the target result.

Regards,

David S. Herzberg, Ph.D.
Vice President, Research and Development 
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



-Original Message-
From: Peter Ehlers [mailto:ehl...@ucalgary.ca] 
Sent: Tuesday, October 26, 2010 9:23 AM
To: David Herzberg
Cc: Petr PIKAL; r-help@r-project.org
Subject: Re: [R] Conditional looping over a set of variables in R

I would still recommend

  vector_of_column_number - apply(yourdata, 1, match, x=1)

as the simplest way if you only want the number of the column that has the 
first 1 or 1 (the call works as is for both numeric and character data). Rows 
which have no 1s will return a value of NA.

Anything wrong with it?

   -Peter Ehlers

On 2010-10-26 07:50, David Herzberg wrote:

 Thank you - I will try this solution as well.

 Sent via DROID X


 -Original message-
 From: Petr PIKALpetr.pi...@precheza.cz
 To: David Herzbergdav...@wpspublish.com
 Cc: Adrienne Woottenamwoo...@ncsu.edu, 
 r-help@r-project.orgr-help@r-project.org
 Sent: Tue, Oct 26, 2010 06:43:09 GMT+00:00
 Subject: Re: [R] Conditional looping over a set of variables in R

 Hi

 r-help-boun...@r-project.org napsal dne 25.10.2010 20:41:55:

 Adrienne, there's one glitch when I implement your solution below. 
 When
 the
 loop encounters a case with no data at all (that is, all 140 item
 responses
 are missing), it aborts and prints this error message:  ERROR: 
 argument
 is
 of length zero.

 I wonder if there's a logical condition I could add that would enable 
 R
 to
 skip these empty cases and continue executing on the next case that
 contains data.

 Thanks, Dave

 David S. Herzberg, Ph.D.
 Vice President, Research and Development Western Psychological 
 Services
 12031 Wilshire Blvd.
 Los Angeles, CA 90025-1251
 Phone: (310)478-2061 x144
 FAX: (310)478-7838
 email: dav...@wpspublish.com



 From: wootten.adrie...@gmail.com [mailto:wootten.adrie...@gmail.com] 
 On
 Behalf
 Of Adrienne Wootten
 Sent: Friday, October 22, 2010 9:09 AM
 To: David Herzberg
 Cc: r-help@r-project.org
 Subject: Re: [R] Conditional looping over a set of variables in R

 David,

 here I'm referring to your data as testmat, a matrix of 140 columns 
 and
 1500
 rows, but the same or similar notation can be applied to data frames 
 in
 R.  If
 I understand correctly, you are looking for the first response 
 (column)
 where
 you got a value of 1.  I'm assuming also that since your missing 
 values
 are
 characters then your two numeric values are also characters.  keeping
 all this
 in mind, try something like this.

 If you really only want to know which column in each row has first 
 occurrence of 1 (or any other value)  you can get rid of looping and 
 use other R capabilities.

 set.seed(111)
 mat-matrix(sample(1:3, 20, replace=T),5,4) mat
   [,1] [,2] [,3] [,4]
 [1,]2222
 [2,]3121
 [3,]2213
 [4,]2211
 [5,]2112
 mat.w-which(mat==1, arr.ind=T)
 tapply(mat.w[,2], mat.w[,1], min)
 2 3 4 5
 2 3 3 2
 mat[2, ]-NA
 mat
   [,1] [,2] [,3] [,4]
 [1,]2222
 [2,]   NA   NA   NA   NA
 [3,]2213
 [4,]2211
 [5,]2112

 and this approach smoothly works with NA values too

 mat.w-which(mat==1, arr.ind=T)
 tapply(mat.w[,2], mat.w[,1], min)
 3 4 5
 3 3 2

 You can then use modify such output as you have info about columns and 
 rows. I am sure there are other maybe better options, e.g.

 lll-as.list(as.data.frame(t(mat)))
 unlist(lapply(lll, function(x) min(which(x==1
   V1  V2  V3  V4  V5
 Inf Inf   3   3   2

 Regards
 Petr


 first = c() # your extra variable which will eventually contain the
 first
 correct response for each case

 for(i in 1:nrow(testmat)){

 c = 1

 while( c=ncol(testmat) | testmat[i,c] != 1 ){

 if( testmat[i,c] == 1){

 first[i] = c
 break # will exit the while loop once it finds the first correct 
 answer,
 and
 then jump to the next case

   } else {

 c=c+1 # procede to the next column if not

 }

 }

 }


 Hope this helps you out a bit.

 Adrienne Wootten
 NCSU

 On Fri, Oct 22, 2010 at 11:33 AM, David 
 Herzbergdav...@wpspublish.com mailto:dav...@wpspublish.com  wrote:
 Here's the problem I'm trying to solve in R: I have a data frame that
 consists
 of about 1500 cases (rows) of data from kids who took a test of
 listening
 comprehension. The columns are their scores (1 = correct, 0 = 
 incorrect,
   . =
 missing) on 140 test items. The items are numbered sequentially and 
 are ordered by increasing difficulty as you go from left to right 
 across the

 columns. I want R

Re: [R] Conditional looping over a set of variables in R

2010-10-27 Thread Peter Ehlers

On 2010-10-27 06:21, David Herzberg wrote:

Peter, thanks for this elegant solution that works well and handles the empty cases. 
However, the vector it returns includes both the row (case) numbers and the target result 
(number of column of first 1). How can I strip out the row numbers and leave 
only the target result.


Use unname(x) or as.vector(x) on the result.

  -Peter Ehlers



Regards,

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



-Original Message-
From: Peter Ehlers [mailto:ehl...@ucalgary.ca]
Sent: Tuesday, October 26, 2010 9:23 AM
To: David Herzberg
Cc: Petr PIKAL; r-help@r-project.org
Subject: Re: [R] Conditional looping over a set of variables in R

I would still recommend

   vector_of_column_number- apply(yourdata, 1, match, x=1)

as the simplest way if you only want the number of the column that has the first 1 or 
1 (the call works as is for both numeric and character data). Rows which have 
no 1s will return a value of NA.

Anything wrong with it?

-Peter Ehlers

On 2010-10-26 07:50, David Herzberg wrote:


Thank you - I will try this solution as well.

Sent via DROID X


-Original message-
From: Petr PIKALpetr.pi...@precheza.cz
To: David Herzbergdav...@wpspublish.com
Cc: Adrienne Woottenamwoo...@ncsu.edu,
r-help@r-project.orgr-help@r-project.org
Sent: Tue, Oct 26, 2010 06:43:09 GMT+00:00
Subject: Re: [R] Conditional looping over a set of variables in R

Hi

r-help-boun...@r-project.org napsal dne 25.10.2010 20:41:55:


Adrienne, there's one glitch when I implement your solution below.
When

the

loop encounters a case with no data at all (that is, all 140 item

responses

are missing), it aborts and prints this error message:  ERROR:
argument

is

of length zero.

I wonder if there's a logical condition I could add that would enable
R

to

skip these empty cases and continue executing on the next case that

contains data.


Thanks, Dave

David S. Herzberg, Ph.D.
Vice President, Research and Development Western Psychological
Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



From: wootten.adrie...@gmail.com [mailto:wootten.adrie...@gmail.com]
On

Behalf

Of Adrienne Wootten
Sent: Friday, October 22, 2010 9:09 AM
To: David Herzberg
Cc: r-help@r-project.org
Subject: Re: [R] Conditional looping over a set of variables in R

David,

here I'm referring to your data as testmat, a matrix of 140 columns
and

1500

rows, but the same or similar notation can be applied to data frames
in

R.  If

I understand correctly, you are looking for the first response
(column)

where

you got a value of 1.  I'm assuming also that since your missing
values

are

characters then your two numeric values are also characters.  keeping

all this

in mind, try something like this.


If you really only want to know which column in each row has first
occurrence of 1 (or any other value)  you can get rid of looping and
use other R capabilities.


set.seed(111)
mat-matrix(sample(1:3, 20, replace=T),5,4) mat

   [,1] [,2] [,3] [,4]
[1,]2222
[2,]3121
[3,]2213
[4,]2211
[5,]2112

mat.w-which(mat==1, arr.ind=T)
tapply(mat.w[,2], mat.w[,1], min)

2 3 4 5
2 3 3 2

mat[2, ]-NA
mat

   [,1] [,2] [,3] [,4]
[1,]2222
[2,]   NA   NA   NA   NA
[3,]2213
[4,]2211
[5,]2112

and this approach smoothly works with NA values too


mat.w-which(mat==1, arr.ind=T)
tapply(mat.w[,2], mat.w[,1], min)

3 4 5
3 3 2

You can then use modify such output as you have info about columns and
rows. I am sure there are other maybe better options, e.g.

lll-as.list(as.data.frame(t(mat)))

unlist(lapply(lll, function(x) min(which(x==1

   V1  V2  V3  V4  V5
Inf Inf   3   3   2

Regards
Petr



first = c() # your extra variable which will eventually contain the

first

correct response for each case

for(i in 1:nrow(testmat)){

c = 1

while( c=ncol(testmat) | testmat[i,c] != 1 ){

if( testmat[i,c] == 1){

first[i] = c
break # will exit the while loop once it finds the first correct
answer,

and

then jump to the next case

   } else {

c=c+1 # procede to the next column if not

}

}

}


Hope this helps you out a bit.

Adrienne Wootten
NCSU

On Fri, Oct 22, 2010 at 11:33 AM, David
Herzbergdav...@wpspublish.com  mailto:dav...@wpspublish.com   wrote:
Here's the problem I'm trying to solve in R: I have a data frame that

consists

of about 1500 cases (rows) of data from kids who took a test of

listening

comprehension. The columns are their scores (1 = correct, 0 =
incorrect,

   . =

missing) on 140 test items. The items are numbered sequentially and
are ordered by increasing difficulty as you go from left to right
across

Re: [R] Conditional looping over a set of variables in R

2010-10-26 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 25.10.2010 20:41:55:

 Adrienne, there's one glitch when I implement your solution below. When 
the 
 loop encounters a case with no data at all (that is, all 140 item 
responses 
 are missing), it aborts and prints this error message:  ERROR: argument 
is 
 of length zero.
 
 I wonder if there's a logical condition I could add that would enable R 
to 
 skip these empty cases and continue executing on the next case that 
contains data.
 
 Thanks, Dave
 
 David S. Herzberg, Ph.D.
 Vice President, Research and Development
 Western Psychological Services
 12031 Wilshire Blvd.
 Los Angeles, CA 90025-1251
 Phone: (310)478-2061 x144
 FAX: (310)478-7838
 email: dav...@wpspublish.com
 
 
 
 From: wootten.adrie...@gmail.com [mailto:wootten.adrie...@gmail.com] On 
Behalf
 Of Adrienne Wootten
 Sent: Friday, October 22, 2010 9:09 AM
 To: David Herzberg
 Cc: r-help@r-project.org
 Subject: Re: [R] Conditional looping over a set of variables in R
 
 David,
 
 here I'm referring to your data as testmat, a matrix of 140 columns and 
1500 
 rows, but the same or similar notation can be applied to data frames in 
R.  If
 I understand correctly, you are looking for the first response (column) 
where 
 you got a value of 1.  I'm assuming also that since your missing values 
are 
 characters then your two numeric values are also characters.  keeping 
all this
 in mind, try something like this.

If you really only want to know which column in each row has first 
occurrence of 1 (or any other value)  you can get rid of looping and use 
other R capabilities.

 set.seed(111)
 mat-matrix(sample(1:3, 20, replace=T),5,4)
 mat
 [,1] [,2] [,3] [,4]
[1,]2222
[2,]3121
[3,]2213
[4,]2211
[5,]2112
 mat.w-which(mat==1, arr.ind=T)
 tapply(mat.w[,2], mat.w[,1], min)
2 3 4 5 
2 3 3 2 
 mat[2, ]-NA
 mat
 [,1] [,2] [,3] [,4]
[1,]2222
[2,]   NA   NA   NA   NA
[3,]2213
[4,]2211
[5,]2112

and this approach smoothly works with NA values too

 mat.w-which(mat==1, arr.ind=T)
 tapply(mat.w[,2], mat.w[,1], min)
3 4 5 
3 3 2 

You can then use modify such output as you have info about columns and 
rows. I am sure there are other maybe better options, e.g.

lll-as.list(as.data.frame(t(mat)))
 unlist(lapply(lll, function(x) min(which(x==1
 V1  V2  V3  V4  V5 
Inf Inf   3   3   2

Regards
Petr

 
 first = c() # your extra variable which will eventually contain the 
first 
 correct response for each case
 
 for(i in 1:nrow(testmat)){
 
 c = 1
 
 while( c=ncol(testmat) | testmat[i,c] != 1 ){
 
 if( testmat[i,c] == 1){
 
 first[i] = c
 break # will exit the while loop once it finds the first correct answer, 
and 
 then jump to the next case
 
  } else {
 
 c=c+1 # procede to the next column if not
 
 }
 
 }
 
 }
 
 
 Hope this helps you out a bit.
 
 Adrienne Wootten
 NCSU
 
 On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg dav...@wpspublish.com
 mailto:dav...@wpspublish.com wrote:
 Here's the problem I'm trying to solve in R: I have a data frame that 
consists
 of about 1500 cases (rows) of data from kids who took a test of 
listening 
 comprehension. The columns are their scores (1 = correct, 0 = incorrect, 
 . = 
 missing) on 140 test items. The items are numbered sequentially and are 
 ordered by increasing difficulty as you go from left to right across the 

 columns. I want R to go through the data and find the first correct 
response 
 for each case. Because of basal and ceiling rules, many cases have 
missing 
 data on many items before the first correct response appears.
 
 For each case, I want R to evaluate the item responses sequentially 
starting 
 with item 1. If the score is 0 or missing, proceed to the next item and 
 evaluate it. If the score is 1, stop the operation for that case, record 
the 
 item number of that first correct response in a new variable, proceed to 
the 
 next case, and restart the operation.
 
 In SPSS, this operation would be carried out with LOOP, VECTOR, and DO 
IF, as 
 follows (assuming the data set is already loaded):
 
 * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT 
 RESPONSE, SET IT EQUAL TO 0.
 numeric LCfirst1.
 comp LCfirst1 = 0
 
 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
 vector x=LC1a_score to LC140a_score.
 
 * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. 
#i IS 
 AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
 loop #i=1 to 140 if (LCfirst1 = 0).
 
 * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT 
OF 
 THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST 
ELEMENT OF 
 THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP 
RUNS 
 AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if 
 STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A 
'1

Re: [R] Conditional looping over a set of variables in R

2010-10-26 Thread David Herzberg

Thank you - I will try this solution as well.

Sent via DROID X


-Original message-
From: Petr PIKAL petr.pi...@precheza.cz
To: David Herzberg dav...@wpspublish.com
Cc: Adrienne Wootten amwoo...@ncsu.edu, r-help@r-project.org 
r-help@r-project.org
Sent: Tue, Oct 26, 2010 06:43:09 GMT+00:00
Subject: Re: [R] Conditional looping over a set of variables in R

Hi

r-help-boun...@r-project.org napsal dne 25.10.2010 20:41:55:

 Adrienne, there's one glitch when I implement your solution below. When
the
 loop encounters a case with no data at all (that is, all 140 item
responses
 are missing), it aborts and prints this error message:  ERROR: argument
is
 of length zero.

 I wonder if there's a logical condition I could add that would enable R
to
 skip these empty cases and continue executing on the next case that
contains data.

 Thanks, Dave

 David S. Herzberg, Ph.D.
 Vice President, Research and Development
 Western Psychological Services
 12031 Wilshire Blvd.
 Los Angeles, CA 90025-1251
 Phone: (310)478-2061 x144
 FAX: (310)478-7838
 email: dav...@wpspublish.com



 From: wootten.adrie...@gmail.com [mailto:wootten.adrie...@gmail.com] On
Behalf
 Of Adrienne Wootten
 Sent: Friday, October 22, 2010 9:09 AM
 To: David Herzberg
 Cc: r-help@r-project.org
 Subject: Re: [R] Conditional looping over a set of variables in R

 David,

 here I'm referring to your data as testmat, a matrix of 140 columns and
1500
 rows, but the same or similar notation can be applied to data frames in
R.  If
 I understand correctly, you are looking for the first response (column)
where
 you got a value of 1.  I'm assuming also that since your missing values
are
 characters then your two numeric values are also characters.  keeping
all this
 in mind, try something like this.

If you really only want to know which column in each row has first
occurrence of 1 (or any other value)  you can get rid of looping and use
other R capabilities.

 set.seed(111)
 mat-matrix(sample(1:3, 20, replace=T),5,4)
 mat
 [,1] [,2] [,3] [,4]
[1,]2222
[2,]3121
[3,]2213
[4,]2211
[5,]2112
 mat.w-which(mat==1, arr.ind=T)
 tapply(mat.w[,2], mat.w[,1], min)
2 3 4 5
2 3 3 2
 mat[2, ]-NA
 mat
 [,1] [,2] [,3] [,4]
[1,]2222
[2,]   NA   NA   NA   NA
[3,]2213
[4,]2211
[5,]2112

and this approach smoothly works with NA values too

 mat.w-which(mat==1, arr.ind=T)
 tapply(mat.w[,2], mat.w[,1], min)
3 4 5
3 3 2

You can then use modify such output as you have info about columns and
rows. I am sure there are other maybe better options, e.g.

lll-as.list(as.data.frame(t(mat)))
 unlist(lapply(lll, function(x) min(which(x==1
 V1  V2  V3  V4  V5
Inf Inf   3   3   2

Regards
Petr


 first = c() # your extra variable which will eventually contain the
first
 correct response for each case

 for(i in 1:nrow(testmat)){

 c = 1

 while( c=ncol(testmat) | testmat[i,c] != 1 ){

 if( testmat[i,c] == 1){

 first[i] = c
 break # will exit the while loop once it finds the first correct answer,
and
 then jump to the next case

  } else {

 c=c+1 # procede to the next column if not

 }

 }

 }


 Hope this helps you out a bit.

 Adrienne Wootten
 NCSU

 On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg dav...@wpspublish.com
 mailto:dav...@wpspublish.com wrote:
 Here's the problem I'm trying to solve in R: I have a data frame that
consists
 of about 1500 cases (rows) of data from kids who took a test of
listening
 comprehension. The columns are their scores (1 = correct, 0 = incorrect,
 . =
 missing) on 140 test items. The items are numbered sequentially and are
 ordered by increasing difficulty as you go from left to right across the

 columns. I want R to go through the data and find the first correct
response
 for each case. Because of basal and ceiling rules, many cases have
missing
 data on many items before the first correct response appears.

 For each case, I want R to evaluate the item responses sequentially
starting
 with item 1. If the score is 0 or missing, proceed to the next item and
 evaluate it. If the score is 1, stop the operation for that case, record
the
 item number of that first correct response in a new variable, proceed to
the
 next case, and restart the operation.

 In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as
 follows (assuming the data set is already loaded):

 * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
 RESPONSE, SET IT EQUAL TO 0.
 numeric LCfirst1.
 comp LCfirst1 = 0

 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
 vector x=LC1a_score to LC140a_score.

 * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0.
#i IS
 AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
 loop #i=1 to 140 if (LCfirst1 = 0).

 * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT
OF
 THE VE

[[alternative

Re: [R] Conditional looping over a set of variables in R

2010-10-26 Thread Peter Ehlers

I would still recommend

 vector_of_column_number - apply(yourdata, 1, match, x=1)

as the simplest way if you only want the number of the
column that has the first 1 or 1 (the call works as is
for both numeric and character data). Rows which have no
1s will return a value of NA.

Anything wrong with it?

  -Peter Ehlers

On 2010-10-26 07:50, David Herzberg wrote:


Thank you - I will try this solution as well.

Sent via DROID X


-Original message-
From: Petr PIKALpetr.pi...@precheza.cz
To: David Herzbergdav...@wpspublish.com
Cc: Adrienne Woottenamwoo...@ncsu.edu, 
r-help@r-project.orgr-help@r-project.org
Sent: Tue, Oct 26, 2010 06:43:09 GMT+00:00
Subject: Re: [R] Conditional looping over a set of variables in R

Hi

r-help-boun...@r-project.org napsal dne 25.10.2010 20:41:55:


Adrienne, there's one glitch when I implement your solution below. When

the

loop encounters a case with no data at all (that is, all 140 item

responses

are missing), it aborts and prints this error message:  ERROR: argument

is

of length zero.

I wonder if there's a logical condition I could add that would enable R

to

skip these empty cases and continue executing on the next case that

contains data.


Thanks, Dave

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



From: wootten.adrie...@gmail.com [mailto:wootten.adrie...@gmail.com] On

Behalf

Of Adrienne Wootten
Sent: Friday, October 22, 2010 9:09 AM
To: David Herzberg
Cc: r-help@r-project.org
Subject: Re: [R] Conditional looping over a set of variables in R

David,

here I'm referring to your data as testmat, a matrix of 140 columns and

1500

rows, but the same or similar notation can be applied to data frames in

R.  If

I understand correctly, you are looking for the first response (column)

where

you got a value of 1.  I'm assuming also that since your missing values

are

characters then your two numeric values are also characters.  keeping

all this

in mind, try something like this.


If you really only want to know which column in each row has first
occurrence of 1 (or any other value)  you can get rid of looping and use
other R capabilities.


set.seed(111)
mat-matrix(sample(1:3, 20, replace=T),5,4)
mat

  [,1] [,2] [,3] [,4]
[1,]2222
[2,]3121
[3,]2213
[4,]2211
[5,]2112

mat.w-which(mat==1, arr.ind=T)
tapply(mat.w[,2], mat.w[,1], min)

2 3 4 5
2 3 3 2

mat[2, ]-NA
mat

  [,1] [,2] [,3] [,4]
[1,]2222
[2,]   NA   NA   NA   NA
[3,]2213
[4,]2211
[5,]2112

and this approach smoothly works with NA values too


mat.w-which(mat==1, arr.ind=T)
tapply(mat.w[,2], mat.w[,1], min)

3 4 5
3 3 2

You can then use modify such output as you have info about columns and
rows. I am sure there are other maybe better options, e.g.

lll-as.list(as.data.frame(t(mat)))

unlist(lapply(lll, function(x) min(which(x==1

  V1  V2  V3  V4  V5
Inf Inf   3   3   2

Regards
Petr



first = c() # your extra variable which will eventually contain the

first

correct response for each case

for(i in 1:nrow(testmat)){

c = 1

while( c=ncol(testmat) | testmat[i,c] != 1 ){

if( testmat[i,c] == 1){

first[i] = c
break # will exit the while loop once it finds the first correct answer,

and

then jump to the next case

  } else {

c=c+1 # procede to the next column if not

}

}

}


Hope this helps you out a bit.

Adrienne Wootten
NCSU

On Fri, Oct 22, 2010 at 11:33 AM, David Herzbergdav...@wpspublish.com
mailto:dav...@wpspublish.com  wrote:
Here's the problem I'm trying to solve in R: I have a data frame that

consists

of about 1500 cases (rows) of data from kids who took a test of

listening

comprehension. The columns are their scores (1 = correct, 0 = incorrect,

  . =

missing) on 140 test items. The items are numbered sequentially and are
ordered by increasing difficulty as you go from left to right across the



columns. I want R to go through the data and find the first correct

response

for each case. Because of basal and ceiling rules, many cases have

missing

data on many items before the first correct response appears.

For each case, I want R to evaluate the item responses sequentially

starting

with item 1. If the score is 0 or missing, proceed to the next item and
evaluate it. If the score is 1, stop the operation for that case, record

the

item number of that first correct response in a new variable, proceed to

the

next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO

IF, as

follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD

Re: [R] Conditional looping over a set of variables in R

2010-10-25 Thread David Herzberg
Adrienne, there's one glitch when I implement your solution below. When the 
loop encounters a case with no data at all (that is, all 140 item responses are 
missing), it aborts and prints this error message:  ERROR:  argument is of 
length zero.

I wonder if there's a logical condition I could add that would enable R to skip 
these empty cases and continue executing on the next case that contains data.

Thanks, Dave

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



From: wootten.adrie...@gmail.com [mailto:wootten.adrie...@gmail.com] On Behalf 
Of Adrienne Wootten
Sent: Friday, October 22, 2010 9:09 AM
To: David Herzberg
Cc: r-help@r-project.org
Subject: Re: [R] Conditional looping over a set of variables in R

David,

here I'm referring to your data as testmat, a matrix of 140 columns and 1500 
rows, but the same or similar notation can be applied to data frames in R.  If 
I understand correctly, you are looking for the first response (column) where 
you got a value of 1.  I'm assuming also that since your missing values are 
characters then your two numeric values are also characters.  keeping all this 
in mind, try something like this.

first = c() # your extra variable which will eventually contain the first 
correct response for each case

for(i in 1:nrow(testmat)){

c = 1

while( c=ncol(testmat) | testmat[i,c] != 1 ){

if( testmat[i,c] == 1){

first[i] = c
break # will exit the while loop once it finds the first correct answer, and 
then jump to the next case

 } else {

c=c+1 # procede to the next column if not

}

}

}


Hope this helps you out a bit.

Adrienne Wootten
NCSU

On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg 
dav...@wpspublish.commailto:dav...@wpspublish.com wrote:
Here's the problem I'm trying to solve in R: I have a data frame that consists 
of about 1500 cases (rows) of data from kids who took a test of listening 
comprehension. The columns are their scores (1 = correct, 0 = incorrect,  . = 
missing) on 140 test items. The items are numbered sequentially and are ordered 
by increasing difficulty as you go from left to right across the columns. I 
want R to go through the data and find the first correct response for each 
case. Because of basal and ceiling rules, many cases have missing data on many 
items before the first correct response appears.

For each case, I want R to evaluate the item responses sequentially starting 
with item 1. If the score is 0 or missing, proceed to the next item and 
evaluate it. If the score is 1, stop the operation for that case, record the 
item number of that first correct response in a new variable, proceed to the 
next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF, as 
follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, 
SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. #i IS 
AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF THE 
VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT OF THE 
VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS AND #i 
INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT 
RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH 
RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF 
LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE 
FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE 
S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO THE NEXT 
CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm stumped. I 
played around with creating a list to hold the item responses variables 
(analogous to 'vector' in SPSS), but when I tried to use the list in an R 
procedure, I kept getting a warning along the lines of  'the list contains  1 
element, only the first element will be used'. So perhaps a list is not the 
appropriate class to 'hold' these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will 
allow me to recreate the operation described above? How do I set up the 
indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy

Re: [R] Conditional looping over a set of variables in R

2010-10-24 Thread David Herzberg
Adrienne - this solves the problem nicely. Thanks for your help.


David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



From: wootten.adrie...@gmail.com [mailto:wootten.adrie...@gmail.com] On Behalf 
Of Adrienne Wootten
Sent: Friday, October 22, 2010 9:09 AM
To: David Herzberg
Cc: r-help@r-project.org
Subject: Re: [R] Conditional looping over a set of variables in R

David,

here I'm referring to your data as testmat, a matrix of 140 columns and 1500 
rows, but the same or similar notation can be applied to data frames in R.  If 
I understand correctly, you are looking for the first response (column) where 
you got a value of 1.  I'm assuming also that since your missing values are 
characters then your two numeric values are also characters.  keeping all this 
in mind, try something like this.

first = c() # your extra variable which will eventually contain the first 
correct response for each case

for(i in 1:nrow(testmat)){

c = 1

while( c=ncol(testmat) | testmat[i,c] != 1 ){

if( testmat[i,c] == 1){

first[i] = c
break # will exit the while loop once it finds the first correct answer, and 
then jump to the next case

 } else {

c=c+1 # procede to the next column if not

}

}

}


Hope this helps you out a bit.

Adrienne Wootten
NCSU

On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg 
dav...@wpspublish.commailto:dav...@wpspublish.com wrote:
Here's the problem I'm trying to solve in R: I have a data frame that consists 
of about 1500 cases (rows) of data from kids who took a test of listening 
comprehension. The columns are their scores (1 = correct, 0 = incorrect,  . = 
missing) on 140 test items. The items are numbered sequentially and are ordered 
by increasing difficulty as you go from left to right across the columns. I 
want R to go through the data and find the first correct response for each 
case. Because of basal and ceiling rules, many cases have missing data on many 
items before the first correct response appears.

For each case, I want R to evaluate the item responses sequentially starting 
with item 1. If the score is 0 or missing, proceed to the next item and 
evaluate it. If the score is 1, stop the operation for that case, record the 
item number of that first correct response in a new variable, proceed to the 
next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF, as 
follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, 
SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. #i IS 
AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF THE 
VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT OF THE 
VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS AND #i 
INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT 
RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH 
RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF 
LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE 
FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE 
S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO THE NEXT 
CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm stumped. I 
played around with creating a list to hold the item responses variables 
(analogous to 'vector' in SPSS), but when I tried to use the list in an R 
procedure, I kept getting a warning along the lines of  'the list contains  1 
element, only the first element will be used'. So perhaps a list is not the 
appropriate class to 'hold' these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will 
allow me to recreate the operation described above? How do I set up the 
indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more information if needed.

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.commailto:dav...@wpspublish.com



   [[alternative HTML version deleted

Re: [R] Conditional looping over a set of variables in R

2010-10-24 Thread Peter Ehlers

This won't be as quick as Bill's elegant solution, but it's a one-liner:

 apply(d, 1, function(x), match(1, x))

See ?match.

  -Peter Ehlers

On 2010-10-22 10:36, David Herzberg wrote:

Bill, thanks so much for this. I'll get a chance to test it later today, and 
will post the outcome.


David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help@r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA 
(missing value).  I made a little function to generate random data of that 
format for testing purposes:

makeData- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
 # pMissing if proportion of missing values
 m- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
 nrow, ncol)
 m[runif(nrow * ncol)  pMissing]- NA
 data.frame(m)
}

E.g.,

 set.seed(168)
 d- makeData(15,3)
 d
   X1 X2 X3
1   1  1  1
2   0  0 NA
3   0  1  0
4   0  0 NA
5   0  1  1
6   0  0 NA
7   1  0  0
8   0  1  1
9   0  0  1
   10   1  1 NA
   11   0  0  1
   12   0  0  0
   13  NA NA NA
   14   0  0  0
   15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

   columnOfFirstOne- function(data) {
   # col will be return value, one entry per row of data.
   # Fill it with NA's: NA in output will mean there were no 1's in row
   col- rep(as.integer(NA), nrow(data))
   for (j in seq_len(ncol(data))) { # loop over columns
   # For each entry in 'col', if it has not been set yet
   # and this entry the j'th column of data is 1 (and not
missing)
   # then set to the column number.
   col[is.na(col)  !is.na(data[, j])  data[, j] == 1]- j
   }
   col # return this from function
   }

With the above data we get
 columnOfFirstOne(d)
[1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
 dd- makeData(nrow=1500, ncol=140)
 system.time(columnOfFirstOne(dd)) # time in seconds
  user  system elapsed
  0.080.000.08

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of David Herzberg
Sent: Friday, October 22, 2010 8:34 AM
To: r-help@r-project.org
Subject: [R] Conditional looping over a set of variables in R

Here's the problem I'm trying to solve in R: I have a data frame that
consists of about 1500 cases (rows) of data from kids who took a test
of listening comprehension. The columns are their scores (1 = correct,
0 = incorrect,  . = missing) on 140 test items. The items are numbered
sequentially and are ordered by increasing difficulty as you go from
left to right across the columns. I want R to go through the data and
find the first correct response for each case. Because of basal and
ceiling rules, many cases have missing data on many items before the
first correct response appears.

For each case, I want R to evaluate the item responses sequentially
starting with item 1. If the score is 0 or missing, proceed to the
next item and evaluate it. If the score is 1, stop the operation for
that case, record the item number of that first correct response in a
new variable, proceed to the next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
LCfirst1 = 0. #i IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME
THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH
ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES
THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM
RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR
ELELMENTS ARE EVALUATED.
THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE
VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT,
WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE
VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE

Re: [R] Conditional looping over a set of variables in R

2010-10-24 Thread Peter Ehlers

Whoops, got an extra comma in there somehow; should be:

  apply(d, 1, function(x) match(1, x))

  -Peter Ehlers

On 2010-10-24 08:17, Peter Ehlers wrote:

This won't be as quick as Bill's elegant solution, but it's a one-liner:

   apply(d, 1, function(x), match(1, x))

See ?match.

-Peter Ehlers

On 2010-10-22 10:36, David Herzberg wrote:

Bill, thanks so much for this. I'll get a chance to test it later today, and 
will post the outcome.


David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help@r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA 
(missing value).  I made a little function to generate random data of that 
format for testing purposes:

makeData- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
  # pMissing if proportion of missing values
  m- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
  nrow, ncol)
  m[runif(nrow * ncol)   pMissing]- NA
  data.frame(m)
}

E.g.,

   set.seed(168)
   d- makeData(15,3)
   d
X1 X2 X3
 1   1  1  1
 2   0  0 NA
 3   0  1  0
 4   0  0 NA
 5   0  1  1
 6   0  0 NA
 7   1  0  0
 8   0  1  1
 9   0  0  1
10   1  1 NA
11   0  0  1
12   0  0  0
13  NA NA NA
14   0  0  0
15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

columnOfFirstOne- function(data) {
# col will be return value, one entry per row of data.
# Fill it with NA's: NA in output will mean there were no 1's in row
col- rep(as.integer(NA), nrow(data))
for (j in seq_len(ncol(data))) { # loop over columns
# For each entry in 'col', if it has not been set yet
# and this entry the j'th column of data is 1 (and not
missing)
# then set to the column number.
col[is.na(col)   !is.na(data[, j])   data[, j] == 1]- j
}
col # return this from function
}

With the above data we get
   columnOfFirstOne(d)
 [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
   dd- makeData(nrow=1500, ncol=140)
   system.time(columnOfFirstOne(dd)) # time in seconds
   user  system elapsed
   0.080.000.08

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of David Herzberg
Sent: Friday, October 22, 2010 8:34 AM
To: r-help@r-project.org
Subject: [R] Conditional looping over a set of variables in R

Here's the problem I'm trying to solve in R: I have a data frame that
consists of about 1500 cases (rows) of data from kids who took a test
of listening comprehension. The columns are their scores (1 = correct,
0 = incorrect,  . = missing) on 140 test items. The items are numbered
sequentially and are ordered by increasing difficulty as you go from
left to right across the columns. I want R to go through the data and
find the first correct response for each case. Because of basal and
ceiling rules, many cases have missing data on many items before the
first correct response appears.

For each case, I want R to evaluate the item responses sequentially
starting with item 1. If the score is 0 or missing, proceed to the
next item and evaluate it. If the score is 1, stop the operation for
that case, record the item number of that first correct response in a
new variable, proceed to the next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
LCfirst1 = 0. #i IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME
THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH
ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES
THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM
RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR
ELELMENTS ARE EVALUATED.
THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE
VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES

Re: [R] Conditional looping over a set of variables in R

2010-10-24 Thread Gabor Grothendieck
On Sun, Oct 24, 2010 at 2:54 PM, Peter Ehlers ehl...@ucalgary.ca wrote:
 Whoops, got an extra comma in there somehow; should be:

  apply(d, 1, function(x) match(1, x))


A slight variation on this would be:

   apply(d, 1, match, x = 1)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional looping over a set of variables in R

2010-10-22 Thread David Herzberg
Here's the problem I'm trying to solve in R: I have a data frame that consists 
of about 1500 cases (rows) of data from kids who took a test of listening 
comprehension. The columns are their scores (1 = correct, 0 = incorrect,  . = 
missing) on 140 test items. The items are numbered sequentially and are ordered 
by increasing difficulty as you go from left to right across the columns. I 
want R to go through the data and find the first correct response for each 
case. Because of basal and ceiling rules, many cases have missing data on many 
items before the first correct response appears.

For each case, I want R to evaluate the item responses sequentially starting 
with item 1. If the score is 0 or missing, proceed to the next item and 
evaluate it. If the score is 1, stop the operation for that case, record the 
item number of that first correct response in a new variable, proceed to the 
next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF, as 
follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, 
SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. #i IS 
AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF THE 
VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT OF THE 
VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS AND #i 
INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT 
RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH 
RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF 
LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE 
FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE 
S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO THE NEXT 
CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm stumped. I 
played around with creating a list to hold the item responses variables 
(analogous to 'vector' in SPSS), but when I tried to use the list in an R 
procedure, I kept getting a warning along the lines of  'the list contains  1 
element, only the first element will be used'. So perhaps a list is not the 
appropriate class to 'hold' these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will 
allow me to recreate the operation described above? How do I set up the 
indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more information if needed.

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional looping over a set of variables in R

2010-10-22 Thread Adrienne Wootten
David,

here I'm referring to your data as testmat, a matrix of 140 columns and 1500
rows, but the same or similar notation can be applied to data frames in R.
If I understand correctly, you are looking for the first response (column)
where you got a value of 1.  I'm assuming also that since your missing
values are characters then your two numeric values are also characters.
keeping all this in mind, try something like this.

first = c() # your extra variable which will eventually contain the first
correct response for each case

for(i in 1:nrow(testmat)){

c = 1

while( c=ncol(testmat) | testmat[i,c] != 1 ){

if( testmat[i,c] == 1){

first[i] = c
break # will exit the while loop once it finds the first correct answer, and
then jump to the next case

 } else {

c=c+1 # procede to the next column if not

}

}

}


Hope this helps you out a bit.

Adrienne Wootten
NCSU


On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg dav...@wpspublish.comwrote:

 Here's the problem I'm trying to solve in R: I have a data frame that
 consists of about 1500 cases (rows) of data from kids who took a test of
 listening comprehension. The columns are their scores (1 = correct, 0 =
 incorrect,  . = missing) on 140 test items. The items are numbered
 sequentially and are ordered by increasing difficulty as you go from left to
 right across the columns. I want R to go through the data and find the first
 correct response for each case. Because of basal and ceiling rules, many
 cases have missing data on many items before the first correct response
 appears.

 For each case, I want R to evaluate the item responses sequentially
 starting with item 1. If the score is 0 or missing, proceed to the next item
 and evaluate it. If the score is 1, stop the operation for that case, record
 the item number of that first correct response in a new variable, proceed to
 the next case, and restart the operation.

 In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF,
 as follows (assuming the data set is already loaded):

 * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
 RESPONSE, SET IT EQUAL TO 0.
 numeric LCfirst1.
 comp LCfirst1 = 0

 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
 vector x=LC1a_score to LC140a_score.

 * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. #i
 IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
 loop #i=1 to 140 if (LCfirst1 = 0).

 * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF
 THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT
 OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP
 RUNS AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if
 STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1'
 IS ENCOUNTERED.
 + do if x(#i) = 1.

 * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH
 RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
 + comp x(#i) = 99.

 * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF
 LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE
 FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO
 CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO
 THE NEXT CASE AND RESTARTS THE LOOP.
 + comp LCfirst1 = #i.
 + end if.
 end loop.
 exe.

 After several hours of trying to translate this procedure to R, I'm
 stumped. I played around with creating a list to hold the item responses
 variables (analogous to 'vector' in SPSS), but when I tried to use the list
 in an R procedure, I kept getting a warning along the lines of  'the list
 contains  1 element, only the first element will be used'. So perhaps a
 list is not the appropriate class to 'hold' these variables?

 It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will
 allow me to recreate the operation described above? How do I set up the
 indexing operation analogous to 'loop #i' in SPSS?

 Any help is appreciated, and I'm happy to provide more information if
 needed.

 David S. Herzberg, Ph.D.
 Vice President, Research and Development
 Western Psychological Services
 12031 Wilshire Blvd.
 Los Angeles, CA 90025-1251
 Phone: (310)478-2061 x144
 FAX: (310)478-7838
 email: dav...@wpspublish.com



[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible 

Re: [R] Conditional looping over a set of variables in R

2010-10-22 Thread William Dunlap
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries
are one of 0, 1, and NA (missing value).  I made a
little function to generate random data of that format
for testing purposes:

makeData - function (nrow = 1500, ncol = 140, pMissing = 0.1) 
{
# pMissing if proportion of missing values
m - matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), 
nrow, ncol)
m[runif(nrow * ncol)  pMissing] - NA
data.frame(m)
}

E.g.,

   set.seed(168)
   d - makeData(15,3)
   d
  X1 X2 X3
   1   1  1  1
   2   0  0 NA
   3   0  1  0
   4   0  0 NA
   5   0  1  1
   6   0  0 NA
   7   1  0  0
   8   0  1  1
   9   0  0  1
  10   1  1 NA
  11   0  0  1
  12   0  0  0
  13  NA NA NA
  14   0  0  0
  15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

  columnOfFirstOne - function(data) {
  # col will be return value, one entry per row of data.
  # Fill it with NA's: NA in output will mean there were no 1's in
row
  col - rep(as.integer(NA), nrow(data))
  for (j in seq_len(ncol(data))) { # loop over columns
  # For each entry in 'col', if it has not been set yet
  # and this entry the j'th column of data is 1 (and not
missing)
  # then set to the column number.
  col[is.na(col)  !is.na(data[, j])  data[, j] == 1] - j
  }
  col # return this from function
  }

With the above data we get
   columnOfFirstOne(d)
   [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
   dd - makeData(nrow=1500, ncol=140)
   system.time(columnOfFirstOne(dd)) # time in seconds
 user  system elapsed 
 0.080.000.08
 
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of David Herzberg
 Sent: Friday, October 22, 2010 8:34 AM
 To: r-help@r-project.org
 Subject: [R] Conditional looping over a set of variables in R
 
 Here's the problem I'm trying to solve in R: I have a data 
 frame that consists of about 1500 cases (rows) of data from 
 kids who took a test of listening comprehension. The columns 
 are their scores (1 = correct, 0 = incorrect,  . = missing) 
 on 140 test items. The items are numbered sequentially and 
 are ordered by increasing difficulty as you go from left to 
 right across the columns. I want R to go through the data and 
 find the first correct response for each case. Because of 
 basal and ceiling rules, many cases have missing data on many 
 items before the first correct response appears.
 
 For each case, I want R to evaluate the item responses 
 sequentially starting with item 1. If the score is 0 or 
 missing, proceed to the next item and evaluate it. If the 
 score is 1, stop the operation for that case, record the item 
 number of that first correct response in a new variable, 
 proceed to the next case, and restart the operation.
 
 In SPSS, this operation would be carried out with LOOP, 
 VECTOR, and DO IF, as follows (assuming the data set is 
 already loaded):
 
 * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST 
 CORRECT RESPONSE, SET IT EQUAL TO 0.
 numeric LCfirst1.
 comp LCfirst1 = 0
 
 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
 vector x=LC1a_score to LC140a_score.
 
 * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS 
 LCfirst1 = 0. #i IS AN INDEX VARIABLE THAT INCREASES BY 1 
 EACH TIME THE LOOP RUNS.
 loop #i=1 to 140 if (LCfirst1 = 0).
 
 * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR 
 EACH ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE 
 EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT 
 IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS 
 AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. 
 THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH 
 THE VECTOR UNTIL A '1' IS ENCOUNTERED.
 + do if x(#i) = 1.
 
 * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT 
 STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
 + comp x(#i) = 99.
 
 * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH 
 RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, 
 THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE 
 FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S 
 THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM 
 MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
 + comp LCfirst1 = #i.
 + end if.
 end loop.
 exe.
 
 After several hours of trying to translate this procedure to 
 R, I'm stumped. I played around with creating a list to hold 
 the item responses variables (analogous to 'vector' in SPSS), 
 but when I tried to use the list in an R procedure, I kept 
 getting a warning along the lines of  'the list contains  1 
 element, only the first element will be used'. So perhaps a 
 list is not the appropriate

Re: [R] Conditional looping over a set of variables in R

2010-10-22 Thread David Herzberg
Bill, thanks so much for this. I'll get a chance to test it later today, and 
will post the outcome.


David S. Herzberg, Ph.D.
Vice President, Research and Development 
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: dav...@wpspublish.com



-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help@r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA 
(missing value).  I made a little function to generate random data of that 
format for testing purposes:

makeData - function (nrow = 1500, ncol = 140, pMissing = 0.1) {
# pMissing if proportion of missing values
m - matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), 
nrow, ncol)
m[runif(nrow * ncol)  pMissing] - NA
data.frame(m)
}

E.g.,

   set.seed(168)
   d - makeData(15,3)
   d
  X1 X2 X3
   1   1  1  1
   2   0  0 NA
   3   0  1  0
   4   0  0 NA
   5   0  1  1
   6   0  0 NA
   7   1  0  0
   8   0  1  1
   9   0  0  1
  10   1  1 NA
  11   0  0  1
  12   0  0  0
  13  NA NA NA
  14   0  0  0
  15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

  columnOfFirstOne - function(data) {
  # col will be return value, one entry per row of data.
  # Fill it with NA's: NA in output will mean there were no 1's in row
  col - rep(as.integer(NA), nrow(data))
  for (j in seq_len(ncol(data))) { # loop over columns
  # For each entry in 'col', if it has not been set yet
  # and this entry the j'th column of data is 1 (and not
missing)
  # then set to the column number.
  col[is.na(col)  !is.na(data[, j])  data[, j] == 1] - j
  }
  col # return this from function
  }

With the above data we get
   columnOfFirstOne(d)
   [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
   dd - makeData(nrow=1500, ncol=140)
   system.time(columnOfFirstOne(dd)) # time in seconds
 user  system elapsed 
 0.080.000.08
 
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of David Herzberg
 Sent: Friday, October 22, 2010 8:34 AM
 To: r-help@r-project.org
 Subject: [R] Conditional looping over a set of variables in R
 
 Here's the problem I'm trying to solve in R: I have a data frame that 
 consists of about 1500 cases (rows) of data from kids who took a test 
 of listening comprehension. The columns are their scores (1 = correct, 
 0 = incorrect,  . = missing) on 140 test items. The items are numbered 
 sequentially and are ordered by increasing difficulty as you go from 
 left to right across the columns. I want R to go through the data and 
 find the first correct response for each case. Because of basal and 
 ceiling rules, many cases have missing data on many items before the 
 first correct response appears.
 
 For each case, I want R to evaluate the item responses sequentially 
 starting with item 1. If the score is 0 or missing, proceed to the 
 next item and evaluate it. If the score is 1, stop the operation for 
 that case, record the item number of that first correct response in a 
 new variable, proceed to the next case, and restart the operation.
 
 In SPSS, this operation would be carried out with LOOP, VECTOR, and DO 
 IF, as follows (assuming the data set is already loaded):
 
 * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT 
 RESPONSE, SET IT EQUAL TO 0.
 numeric LCfirst1.
 comp LCfirst1 = 0
 
 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
 vector x=LC1a_score to LC140a_score.
 
 * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
 LCfirst1 = 0. #i IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME 
 THE LOOP RUNS.
 loop #i=1 to 140 if (LCfirst1 = 0).
 
 * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH 
 ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES 
 THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM 
 RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR 
 ELELMENTS ARE EVALUATED.
 THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE 
 VECTOR UNTIL A '1' IS ENCOUNTERED.
 + do if x(#i) = 1.
 
 * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, 
 WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
 + comp x(#i) = 99.
 
 * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE 
 VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM 
 NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE 
 OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING