[R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?

2009-03-21 Thread Donald Macnaughton
I have a data frame with roughly 500 rows and 120 variables.  I would like
to generate a new data frame that will include one row for each PAIR of
rows in the original data frame and will include all 120 + 120 = 240
variables from the two rows.  I need only one row for each pair, not two
rows.  Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows.  
 
Is there an easy way to do this with R?  
 
Thanks in advance,
 
Don Macnaughton

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?

2009-03-21 Thread jim holtman
Try this:

 x - data.frame(a=1:100, b=100:1, c=sample(100))
 # assume even number of rows: bind the even/odd together
 even - seq(nrow(x)) %% 2
 new.x - cbind(x[even==1,], x[even==0,])


 head(new.x)
a   b  c a.1 b.1 c.1
1   1 100 69   2  99  60
3   3  98 24   4  97  26
5   5  96 71   6  95  43
7   7  94 17   8  93  70
9   9  92 10  10  91  79
11 11  90 56  12  89  50



On Sat, Mar 21, 2009 at 12:01 PM, Donald Macnaughton don...@matstat.com wrote:
 I have a data frame with roughly 500 rows and 120 variables.  I would like
 to generate a new data frame that will include one row for each PAIR of
 rows in the original data frame and will include all 120 + 120 = 240
 variables from the two rows.  I need only one row for each pair, not two
 rows.  Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows.

 Is there an easy way to do this with R?

 Thanks in advance,

 Don Macnaughton

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?

2009-03-21 Thread Duncan Murdoch

On 21/03/2009 12:01 PM, Donald Macnaughton wrote:

I have a data frame with roughly 500 rows and 120 variables.  I would like
to generate a new data frame that will include one row for each PAIR of
rows in the original data frame and will include all 120 + 120 = 240
variables from the two rows.  I need only one row for each pair, not two
rows.  Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows.  
 
Is there an easy way to do this with R?  


Probably the easiest is to generate row indices for each pair, e.g.


n - nrow(mydata)

row1 - rep(1:n, n)
row2 - rep(1:n, each=n)
keep - row1  row2

big - cbind(mydata[row1[keep],], mydata[row2[keep],])


With a simple example

 mydata - data.frame(a=1:3, b=letters[1:3])
 mydata
  a b
1 1 a
2 2 b
3 3 c

this produces

 big
a b a b
1   1 a 2 b
1.1 1 a 3 c
2   2 b 3 c

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?

2009-03-21 Thread David Winsemius
I hacked at a bit differently than Duncan. See if these help pages and  
this example point another way:


?combn
?[


 df - data.frame(a = 1:4, b=LETTERS[1:4])
 n - nrow(df)
 cbind(df[combn(1:n,2)[1,],], df[combn(1:n,2)[2,],] )
a b a b
1   1 A 2 B
1.1 1 A 3 C
1.2 1 A 4 D
2   2 B 3 C
2.1 2 B 4 D
3   3 C 4 D


--
David Winsemius

On Mar 21, 2009, at 12:01 PM, Donald Macnaughton wrote:

I have a data frame with roughly 500 rows and 120 variables.  I  
would like
to generate a new data frame that will include one row for each PAIR  
of

rows in the original data frame and will include all 120 + 120 = 240
variables from the two rows.  I need only one row for each pair, not  
two
rows.  Thus the new data frame will contain 500 x 499 / 2 = 124,750  
rows.


Is there an easy way to do this with R?




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?

2009-03-21 Thread Donald Macnaughton
On Sat, Mar 21, 2009 at 12:01 PM, I wrote:
 I have a data frame with roughly 500 rows and 120 variables.  
 I would like to generate a new data frame that will include 
 one row for each PAIR of rows in the original data frame and 
 will include all 120 + 120 = 240 variables from the two rows.  
 I need only one row for each pair, not two rows.  Thus the 
 new data frame will contain 500 x 499 / 2 = 124,750 rows.

 Is there an easy way to do this with R?

 Thanks in advance,

 Don Macnaughton


I thank David Wisemius, Duncan Murdoch, and Jim Holtman for their helpful
replies.  Jim wrote

 What is the problem that you are trying to solve?

This work is for a client whose son was accused of cheating on a multiple
choice exam.  One can investigate this matter statistically by computing
the number of matching answers to questions on the exam between all pairs
of students.  Of course under the null hypothesis of no cheating the number
of matching answers has a certain distribution, which allows one to reject
the null hypothesis if the number of matching answers is unduly large for a
particular pair.  (The distribution is generally taken with respect to the
average number of correct answers in a given pair because the more correct
answers, the more matches can be expected under the null hypothesis.)

Wesolowsky (2000) discusses some of the statistical and ethical aspects of
this exercise.

Don Macnaughton


REFERENCE

Wesolowsky, G. O. 2000. Detecting excessive similarity in answers on
multiple choice exams.  _Journal of Applied Statistics,_ 27, 909-921.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.