Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-05 Thread Simon Kiss
HI, of course.

The a mini-version of my data-set is below, stored in d2. Then the code I'm 
working follows.
library(reshape2)
#Create d2
structure(list(row = 1:50, rank1 = structure(c(3L, 3L, 3L, 4L, 
3L, 3L, NA, NA, 3L, NA, 3L, 3L, 1L, NA, 2L, NA, 3L, NA, 2L, 1L, 
1L, 3L, NA, 6L, NA, 1L, NA, 3L, 1L, NA, 1L, NA, NA, 6L, 3L, NA, 
1L, 3L, 3L, 4L, 1L, NA, 3L, 3L, 3L, NA, 3L, 3L, NA, 1L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank2 = structure(c(6L, 1L, 1L, 
2L, 4L, 6L, NA, NA, 6L, NA, 6L, 4L, 2L, NA, 4L, NA, 6L, NA, 1L, 
6L, 3L, 2L, NA, 3L, NA, 6L, NA, 6L, 6L, NA, 3L, NA, NA, 3L, 6L, 
NA, 6L, 6L, 6L, 7L, 3L, NA, 1L, 6L, 6L, NA, 2L, 6L, NA, 2L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank3 = structure(c(1L, 6L, 4L, 
3L, 2L, 4L, NA, NA, 4L, NA, 1L, 1L, 6L, NA, 1L, NA, 1L, NA, 7L, 
3L, 6L, 1L, NA, 2L, NA, 4L, NA, 1L, 3L, NA, 6L, NA, NA, 4L, 2L, 
NA, 7L, 1L, 1L, 6L, 7L, NA, 6L, 1L, 1L, NA, 4L, 1L, NA, 3L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank4 = structure(c(7L, 4L, 2L, 
1L, 1L, 7L, NA, NA, 1L, NA, 7L, 2L, 7L, NA, 3L, NA, 2L, NA, 3L, 
4L, 5L, 6L, NA, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, NA, NA, 2L, 7L, 
NA, 2L, 2L, 2L, 3L, 6L, NA, 2L, 5L, 4L, NA, 1L, 2L, NA, 4L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank5 = structure(c(2L, 7L, 6L, 
7L, 7L, 2L, NA, NA, 2L, NA, 2L, 7L, 3L, NA, 6L, NA, 7L, NA, 6L, 
7L, 4L, 7L, NA, 7L, NA, 7L, NA, 2L, 2L, NA, 2L, NA, NA, 7L, 1L, 
NA, 3L, 7L, 4L, 2L, 2L, NA, 4L, 2L, 2L, NA, 6L, 4L, NA, 5L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank6 = structure(c(4L, 2L, 7L, 
6L, 6L, 1L, NA, NA, 7L, NA, 4L, 5L, 4L, NA, 7L, NA, 4L, NA, 4L, 
2L, 2L, 4L, NA, 1L, NA, 2L, NA, 7L, 7L, NA, 7L, NA, NA, 1L, 4L, 
NA, 4L, 4L, 7L, 1L, 4L, NA, 7L, 7L, 7L, NA, 7L, 7L, NA, 7L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank7 = structure(c(5L, 5L, 5L, 
5L, 5L, 5L, NA, NA, 5L, NA, 5L, 6L, 5L, NA, 5L, NA, 5L, NA, 5L, 
5L, 7L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 
NA, 5L, NA, 5L, 5L, 5L, NA, 5L, 4L, 5L, NA, 5L, 5L, NA, 6L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor)), .Names = c(row, rank1, rank2, 
rank3, rank4, rank5, rank6, rank7), row.names = c(NA, 
50L), class = data.frame)


#This code is a replication of David Carlson's code (below) which works 
splendidly, but does not work on my data-set
#Melt d2: Note, I've used value.name='color' to maximize comparability with 
David's suggestion
d3 - melt(d2, id.vars=1, measure.vars=2:8, 
variable.name=rank,value.name=color)
#Make Rank Variable Numeric
d3$rank-as.numeric(d3$rank)
#Recast d3 into d4
d4- dcast(d3, row~color,value.var=rank, fill=0)
#Note that d4 appears to provide a binary variable for one if a respondent 
checked the option, but does not provide information as to which rank they 
assigned each option, but also seems to summarize the number of missing values

#David Carlson's Code
mydf - data.frame(t(replicate(100, sample(c(red, blue,  green, yellow, 
NA), 4
mydf - data.frame(rows=1:100, mydf)
colnames(mydf) - c(row, rank1, rank2, rank3, rank4)
mymelt - melt(mydf, id.vars=1, measure.vars=2:5, variable.name=rank, 
value.name=color)
mymelt$rank - as.numeric(mymelt$rank)
mycast - dcast(mymelt, row~color, value.var=rank, fill=0)

#Compare
str(mydf)
str(d2)
head(mycast)
head(d4)

Again, I'm grateful for assistance. I can't understand what how my data-set 
differs from David's sample data-set.
Simon Kiss
On Sep 4, 2014, at 2:35 PM, David L Carlson dcarl...@tamu.edu wrote:

 I think we would need enough of the data you are using to figure out how to 
 modify the process. Can you use dput() to send a small data set that fails to 
 work?
 
 David C
 
 -Original Message-
 From: Simon Kiss [mailto:sjk...@gmail.com] 
 Sent: Thursday, September 4, 2014 1:28 PM
 To: David L Carlson
 Cc: r-help@r-project.org
 Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
 Data Frame
 
 Hi David and list:
 This is working, except at this command
 mycast - dcast(mymelt, row~color, value.var=rank, fill=0)
 
 dcast is using length as the default aggregating function. This results in 
 not accurate results. It tells me, for example how many choices were missing 
 values and it tells me if a person selected any given option (value is 
 reported as 1).
 When I try to run your reproducible research, it works great, but something 
 with the aggregating function is not working properly with mine. 
 Any other thoughts?
 Simon
 On Aug 18, 2014, at 10:44 AM, David L Carlson dcarl...@tamu.edu wrote:
 
 Another approach using reshape2:
 
 library(reshape2)
 # 

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-05 Thread David L Carlson
The big difference between the data sets is that many of your rows (16) have 
all missing values. None of mine do. If you run my data and yours, you will see 
that decast throws a warning Aggregation function missing: defaulting to 
length with your data but not with mine. As a result, instead of using the 
value of rank, dcast uses length(rank) which is always 1 except when there are 
multiple missing values when it is the number of missing values. This problem 
will occur whenever there is more than one missing value on a row. The simplest 
way to handle this is to create a function that returns the first value of a 
vector and use that with the fun.aggregate= argument:

 first - function(x) {x[1]}
 d4- dcast(d3, row~color, fun.aggregate=first, value.var=rank, fill=0)

The only drawback is that this will not warn you if a category was ranked twice 
except that the NA column will be zero and one of the other columns will be 
zero. The number of missing values is the number of zeroes in your category 
columns (not including row or NA) and the value in NA is the lowest rank that 
was missing.

David C

-Original Message-
From: Simon Kiss [mailto:sjk...@gmail.com] 
Sent: Friday, September 5, 2014 10:22 AM
To: David L Carlson
Cc: r-help@r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame

HI, of course.

The a mini-version of my data-set is below, stored in d2. Then the code I'm 
working follows.
library(reshape2)
#Create d2
structure(list(row = 1:50, rank1 = structure(c(3L, 3L, 3L, 4L, 
3L, 3L, NA, NA, 3L, NA, 3L, 3L, 1L, NA, 2L, NA, 3L, NA, 2L, 1L, 
1L, 3L, NA, 6L, NA, 1L, NA, 3L, 1L, NA, 1L, NA, NA, 6L, 3L, NA, 
1L, 3L, 3L, 4L, 1L, NA, 3L, 3L, 3L, NA, 3L, 3L, NA, 1L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank2 = structure(c(6L, 1L, 1L, 
2L, 4L, 6L, NA, NA, 6L, NA, 6L, 4L, 2L, NA, 4L, NA, 6L, NA, 1L, 
6L, 3L, 2L, NA, 3L, NA, 6L, NA, 6L, 6L, NA, 3L, NA, NA, 3L, 6L, 
NA, 6L, 6L, 6L, 7L, 3L, NA, 1L, 6L, 6L, NA, 2L, 6L, NA, 2L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank3 = structure(c(1L, 6L, 4L, 
3L, 2L, 4L, NA, NA, 4L, NA, 1L, 1L, 6L, NA, 1L, NA, 1L, NA, 7L, 
3L, 6L, 1L, NA, 2L, NA, 4L, NA, 1L, 3L, NA, 6L, NA, NA, 4L, 2L, 
NA, 7L, 1L, 1L, 6L, 7L, NA, 6L, 1L, 1L, NA, 4L, 1L, NA, 3L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank4 = structure(c(7L, 4L, 2L, 
1L, 1L, 7L, NA, NA, 1L, NA, 7L, 2L, 7L, NA, 3L, NA, 2L, NA, 3L, 
4L, 5L, 6L, NA, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, NA, NA, 2L, 7L, 
NA, 2L, 2L, 2L, 3L, 6L, NA, 2L, 5L, 4L, NA, 1L, 2L, NA, 4L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank5 = structure(c(2L, 7L, 6L, 
7L, 7L, 2L, NA, NA, 2L, NA, 2L, 7L, 3L, NA, 6L, NA, 7L, NA, 6L, 
7L, 4L, 7L, NA, 7L, NA, 7L, NA, 2L, 2L, NA, 2L, NA, NA, 7L, 1L, 
NA, 3L, 7L, 4L, 2L, 2L, NA, 4L, 2L, 2L, NA, 6L, 4L, NA, 5L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank6 = structure(c(4L, 2L, 7L, 
6L, 6L, 1L, NA, NA, 7L, NA, 4L, 5L, 4L, NA, 7L, NA, 4L, NA, 4L, 
2L, 2L, 4L, NA, 1L, NA, 2L, NA, 7L, 7L, NA, 7L, NA, NA, 1L, 4L, 
NA, 4L, 4L, 7L, 1L, 4L, NA, 7L, 7L, 7L, NA, 7L, 7L, NA, 7L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor), rank7 = structure(c(5L, 5L, 5L, 
5L, 5L, 5L, NA, NA, 5L, NA, 5L, 6L, 5L, NA, 5L, NA, 5L, NA, 5L, 
5L, 7L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 
NA, 5L, NA, 5L, 5L, 5L, NA, 5L, 4L, 5L, NA, 5L, 5L, NA, 6L), .Label = 
c(accessible, 
alternatives, information, responsive, social, technical, 
trade), class = factor)), .Names = c(row, rank1, rank2, 
rank3, rank4, rank5, rank6, rank7), row.names = c(NA, 
50L), class = data.frame)


#This code is a replication of David Carlson's code (below) which works 
splendidly, but does not work on my data-set
#Melt d2: Note, I've used value.name='color' to maximize comparability with 
David's suggestion
d3 - melt(d2, id.vars=1, measure.vars=2:8, 
variable.name=rank,value.name=color)
#Make Rank Variable Numeric
d3$rank-as.numeric(d3$rank)
#Recast d3 into d4
d4- dcast(d3, row~color,value.var=rank, fill=0)
#Note that d4 appears to provide a binary variable for one if a respondent 
checked the option, but does not provide information as to which rank they 
assigned each option, but also seems to summarize the number of missing values

#David Carlson's Code
mydf - data.frame(t(replicate(100, sample(c(red, blue,  green, yellow, 
NA), 4
mydf - data.frame(rows=1:100, mydf)
colnames(mydf) - c(row, rank1, rank2, rank3, rank4)
mymelt - melt(mydf, id.vars=1, measure.vars=2:5, variable.name=rank, 
value.name=color)
mymelt$rank - as.numeric(mymelt$rank)
mycast - dcast(mymelt, row~color, value.var=rank, 

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-04 Thread Simon Kiss
Hi David and list:
This is working, except at this command
mycast - dcast(mymelt, row~color, value.var=rank, fill=0)

dcast is using length as the default aggregating function. This results in 
not accurate results. It tells me, for example how many choices were missing 
values and it tells me if a person selected any given option (value is reported 
as 1).
When I try to run your reproducible research, it works great, but something 
with the aggregating function is not working properly with mine. 
Any other thoughts?
Simon
On Aug 18, 2014, at 10:44 AM, David L Carlson dcarl...@tamu.edu wrote:

 Another approach using reshape2:
 
 library(reshape2)
 # Construct data/ add column of row numbers
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 +   green, yellow, NA), 4
 mydf - data.frame(rows=1:100, mydf)
 colnames(mydf) - c(row, rank1, rank2, rank3, rank4)
 head(mydf)
  row  rank1  rank2  rank3 rank4
 1   1   NA yellowred  blue
 2   2 yellow  green   NA   red
 3   3 yellow  green   blue  NA
 4   4   NA   blue yellow green
 5   5   NAred   blue green
 6   6   NAred  green  blue
 # Reshape
 mymelt - melt(mydf, id.vars=1, measure.vars=2:5, 
 + variable.name=rank, value.name=color)
 # Convert rank to numeric
 mymelt$rank - as.numeric(mymelt$rank)
 mycast - dcast(mymelt, row~color, value.var=rank, fill=0)
 head(mycast)
  row blue green red yellow NA
 1   14 0   3  2  1
 2   20 2   4  1  3
 3   33 2   0  1  4
 4   42 4   0  3  1
 5   53 4   2  0  1
 6   64 3   2  0  1
 
 David C
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of David L Carlson
 Sent: Sunday, August 17, 2014 6:32 PM
 To: Simon Kiss; r-help@r-project.org
 Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
 Data Frame
 
 There is probably an easier way to do this, but
 
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 +  green, yellow, NA), 4
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 head(mydf)
   rank1  rank2  rank3 rank4
 1   NA yellowred  blue
 2 yellow  green   NA   red
 3 yellow  green   blue  NA
 4   NA   blue yellow green
 5   NAred   blue green
 6   NAred  green  blue
 lvls - levels(mydf$rank1)
 # convert color factors to numeric
 for (i in seq_along(mydf)) mydf[,i] - as.numeric(mydf[,i]) 
 # stack the columns
 mydf2 - stack(mydf)
 # convert rank factor to numeric
 mydf2$ind - as.numeric(mydf2$ind)
 # add row numbers
 mydf2 - data.frame(rows=1:100, mydf2)
 # Create table
 mytbl - xtabs(ind~rows+values, mydf2)
 # convert to data frame
 mydf3 - data.frame(unclass(mytbl))
 colnames(mydf3) - lvls
 head(mydf3)
  blue green red yellow
 14 0   3  2
 20 2   4  1
 33 2   0  1
 42 4   0  3
 53 4   2  0
 64 3   2  0
 
 David C
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Simon Kiss
 Sent: Friday, August 15, 2014 3:58 PM
 To: r-help@r-project.org
 Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
 Data Frame
 
 
 Both the suggestions I got work very well, but what I didn't realize is that 
 NA values would cause serious problems.  Where there is a missing value, 
 using the argument na.last=NA to order just returns the the order of the 
 factor levels, but excludes the missing values, but I have no idea where 
 those occur in the or rather which of those variables were actually missing.  
 Have I explained this problem sufficiently? 
 I didn't think it would cause such a problem so I didn't include it in the 
 original problem definition.
 Yours, Simon
 On Jul 25, 2014, at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote:
 
 I think this gets what you want. But your data are not reproducible since 
 they are randomly drawn without setting a seed and the two data sets have no 
 relationship to one another.
 
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 + green, yellow)
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 mydf2 - data.frame(t(apply(mydf, 1, order)))
 colnames(mydf2) - levels(mydf$rank1)
 head(mydf)
  rank1  rank2  rank3 rank4
 1 yellow  greenred  blue
 2  green   blue yellow   red
 3  green yellowred  blue
 4 yellowred  green  blue
 5 yellowred  green  blue
 6 yellowred   blue green
 head(mydf2)
 blue green red yellow
 14 2   3  1
 22 1   4  3
 34 1   3  2
 44 3   2  1
 54 3   2  1
 63 4   2  1
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Simon Kiss
 Sent: Friday, July 25, 2014 2:34 PM
 To: 

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-04 Thread David L Carlson
I think we would need enough of the data you are using to figure out how to 
modify the process. Can you use dput() to send a small data set that fails to 
work?

David C

-Original Message-
From: Simon Kiss [mailto:sjk...@gmail.com] 
Sent: Thursday, September 4, 2014 1:28 PM
To: David L Carlson
Cc: r-help@r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame

Hi David and list:
This is working, except at this command
mycast - dcast(mymelt, row~color, value.var=rank, fill=0)

dcast is using length as the default aggregating function. This results in 
not accurate results. It tells me, for example how many choices were missing 
values and it tells me if a person selected any given option (value is reported 
as 1).
When I try to run your reproducible research, it works great, but something 
with the aggregating function is not working properly with mine. 
Any other thoughts?
Simon
On Aug 18, 2014, at 10:44 AM, David L Carlson dcarl...@tamu.edu wrote:

 Another approach using reshape2:
 
 library(reshape2)
 # Construct data/ add column of row numbers
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 +   green, yellow, NA), 4
 mydf - data.frame(rows=1:100, mydf)
 colnames(mydf) - c(row, rank1, rank2, rank3, rank4)
 head(mydf)
  row  rank1  rank2  rank3 rank4
 1   1   NA yellowred  blue
 2   2 yellow  green   NA   red
 3   3 yellow  green   blue  NA
 4   4   NA   blue yellow green
 5   5   NAred   blue green
 6   6   NAred  green  blue
 # Reshape
 mymelt - melt(mydf, id.vars=1, measure.vars=2:5, 
 + variable.name=rank, value.name=color)
 # Convert rank to numeric
 mymelt$rank - as.numeric(mymelt$rank)
 mycast - dcast(mymelt, row~color, value.var=rank, fill=0)
 head(mycast)
  row blue green red yellow NA
 1   14 0   3  2  1
 2   20 2   4  1  3
 3   33 2   0  1  4
 4   42 4   0  3  1
 5   53 4   2  0  1
 6   64 3   2  0  1
 
 David C
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of David L Carlson
 Sent: Sunday, August 17, 2014 6:32 PM
 To: Simon Kiss; r-help@r-project.org
 Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
 Data Frame
 
 There is probably an easier way to do this, but
 
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 +  green, yellow, NA), 4
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 head(mydf)
   rank1  rank2  rank3 rank4
 1   NA yellowred  blue
 2 yellow  green   NA   red
 3 yellow  green   blue  NA
 4   NA   blue yellow green
 5   NAred   blue green
 6   NAred  green  blue
 lvls - levels(mydf$rank1)
 # convert color factors to numeric
 for (i in seq_along(mydf)) mydf[,i] - as.numeric(mydf[,i]) 
 # stack the columns
 mydf2 - stack(mydf)
 # convert rank factor to numeric
 mydf2$ind - as.numeric(mydf2$ind)
 # add row numbers
 mydf2 - data.frame(rows=1:100, mydf2)
 # Create table
 mytbl - xtabs(ind~rows+values, mydf2)
 # convert to data frame
 mydf3 - data.frame(unclass(mytbl))
 colnames(mydf3) - lvls
 head(mydf3)
  blue green red yellow
 14 0   3  2
 20 2   4  1
 33 2   0  1
 42 4   0  3
 53 4   2  0
 64 3   2  0
 
 David C
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Simon Kiss
 Sent: Friday, August 15, 2014 3:58 PM
 To: r-help@r-project.org
 Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
 Data Frame
 
 
 Both the suggestions I got work very well, but what I didn't realize is that 
 NA values would cause serious problems.  Where there is a missing value, 
 using the argument na.last=NA to order just returns the the order of the 
 factor levels, but excludes the missing values, but I have no idea where 
 those occur in the or rather which of those variables were actually missing.  
 Have I explained this problem sufficiently? 
 I didn't think it would cause such a problem so I didn't include it in the 
 original problem definition.
 Yours, Simon
 On Jul 25, 2014, at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote:
 
 I think this gets what you want. But your data are not reproducible since 
 they are randomly drawn without setting a seed and the two data sets have no 
 relationship to one another.
 
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 + green, yellow)
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 mydf2 - data.frame(t(apply(mydf, 1, order)))
 colnames(mydf2) - levels(mydf$rank1)
 head(mydf)
  rank1  rank2  rank3 rank4
 1 yellow  greenred  blue
 2  green   blue yellow   red
 3  green yellowred  blue
 4 yellowred  green  blue
 5 yellowred  green  blue
 6 yellowred   blue green
 head(mydf2)
 blue green red yellow
 14 2   3  1
 22 1   4  3
 

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-08-18 Thread David L Carlson
Another approach using reshape2:

 library(reshape2)
 # Construct data/ add column of row numbers
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
+   green, yellow, NA), 4
 mydf - data.frame(rows=1:100, mydf)
 colnames(mydf) - c(row, rank1, rank2, rank3, rank4)
 head(mydf)
  row  rank1  rank2  rank3 rank4
1   1   NA yellowred  blue
2   2 yellow  green   NA   red
3   3 yellow  green   blue  NA
4   4   NA   blue yellow green
5   5   NAred   blue green
6   6   NAred  green  blue
 # Reshape
 mymelt - melt(mydf, id.vars=1, measure.vars=2:5, 
+ variable.name=rank, value.name=color)
 # Convert rank to numeric
 mymelt$rank - as.numeric(mymelt$rank)
 mycast - dcast(mymelt, row~color, value.var=rank, fill=0)
 head(mycast)
  row blue green red yellow NA
1   14 0   3  2  1
2   20 2   4  1  3
3   33 2   0  1  4
4   42 4   0  3  1
5   53 4   2  0  1
6   64 3   2  0  1

David C

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David L Carlson
Sent: Sunday, August 17, 2014 6:32 PM
To: Simon Kiss; r-help@r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame

There is probably an easier way to do this, but

 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
+  green, yellow, NA), 4
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 head(mydf)
   rank1  rank2  rank3 rank4
1   NA yellowred  blue
2 yellow  green   NA   red
3 yellow  green   blue  NA
4   NA   blue yellow green
5   NAred   blue green
6   NAred  green  blue
 lvls - levels(mydf$rank1)
 # convert color factors to numeric
 for (i in seq_along(mydf)) mydf[,i] - as.numeric(mydf[,i]) 
 # stack the columns
 mydf2 - stack(mydf)
 # convert rank factor to numeric
 mydf2$ind - as.numeric(mydf2$ind)
 # add row numbers
 mydf2 - data.frame(rows=1:100, mydf2)
 # Create table
 mytbl - xtabs(ind~rows+values, mydf2)
 # convert to data frame
 mydf3 - data.frame(unclass(mytbl))
 colnames(mydf3) - lvls
 head(mydf3)
  blue green red yellow
14 0   3  2
20 2   4  1
33 2   0  1
42 4   0  3
53 4   2  0
64 3   2  0

David C

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Simon Kiss
Sent: Friday, August 15, 2014 3:58 PM
To: r-help@r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame


Both the suggestions I got work very well, but what I didn't realize is that NA 
values would cause serious problems.  Where there is a missing value, using the 
argument na.last=NA to order just returns the the order of the factor levels, 
but excludes the missing values, but I have no idea where those occur in the or 
rather which of those variables were actually missing.  
Have I explained this problem sufficiently? 
I didn't think it would cause such a problem so I didn't include it in the 
original problem definition.
Yours, Simon
On Jul 25, 2014, at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote:

 I think this gets what you want. But your data are not reproducible since 
 they are randomly drawn without setting a seed and the two data sets have no 
 relationship to one another.
 
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 + green, yellow)
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 mydf2 - data.frame(t(apply(mydf, 1, order)))
 colnames(mydf2) - levels(mydf$rank1)
 head(mydf)
   rank1  rank2  rank3 rank4
 1 yellow  greenred  blue
 2  green   blue yellow   red
 3  green yellowred  blue
 4 yellowred  green  blue
 5 yellowred  green  blue
 6 yellowred   blue green
 head(mydf2)
  blue green red yellow
 14 2   3  1
 22 1   4  3
 34 1   3  2
 44 3   2  1
 54 3   2  1
 63 4   2  1
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Simon Kiss
 Sent: Friday, July 25, 2014 2:34 PM
 To: r-help@r-project.org
 Subject: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
 Frame
 
 Hello:
 I have data that looks like mydf, below.  It is the results of a survey where 
 participants were to put a number of statements (in this case colours) in 
 their order of preference. In this case, the rank number is the variable, and 
 the factor level for each respondent is which colour they assigned to that 
 rank.  I would like to find a way to effectively transpose the data frame so 
 that it looks like mydf2, also below, where the colours the participants were 
 able to choose are the variables and the variable score is what that person 
 

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-08-17 Thread David L Carlson
There is probably an easier way to do this, but

 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
+  green, yellow, NA), 4
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 head(mydf)
   rank1  rank2  rank3 rank4
1   NA yellowred  blue
2 yellow  green   NA   red
3 yellow  green   blue  NA
4   NA   blue yellow green
5   NAred   blue green
6   NAred  green  blue
 lvls - levels(mydf$rank1)
 # convert color factors to numeric
 for (i in seq_along(mydf)) mydf[,i] - as.numeric(mydf[,i]) 
 # stack the columns
 mydf2 - stack(mydf)
 # convert rank factor to numeric
 mydf2$ind - as.numeric(mydf2$ind)
 # add row numbers
 mydf2 - data.frame(rows=1:100, mydf2)
 # Create table
 mytbl - xtabs(ind~rows+values, mydf2)
 # convert to data frame
 mydf3 - data.frame(unclass(mytbl))
 colnames(mydf3) - lvls
 head(mydf3)
  blue green red yellow
14 0   3  2
20 2   4  1
33 2   0  1
42 4   0  3
53 4   2  0
64 3   2  0

David C

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Simon Kiss
Sent: Friday, August 15, 2014 3:58 PM
To: r-help@r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame


Both the suggestions I got work very well, but what I didn't realize is that NA 
values would cause serious problems.  Where there is a missing value, using the 
argument na.last=NA to order just returns the the order of the factor levels, 
but excludes the missing values, but I have no idea where those occur in the or 
rather which of those variables were actually missing.  
Have I explained this problem sufficiently? 
I didn't think it would cause such a problem so I didn't include it in the 
original problem definition.
Yours, Simon
On Jul 25, 2014, at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote:

 I think this gets what you want. But your data are not reproducible since 
 they are randomly drawn without setting a seed and the two data sets have no 
 relationship to one another.
 
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 + green, yellow)
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 mydf2 - data.frame(t(apply(mydf, 1, order)))
 colnames(mydf2) - levels(mydf$rank1)
 head(mydf)
   rank1  rank2  rank3 rank4
 1 yellow  greenred  blue
 2  green   blue yellow   red
 3  green yellowred  blue
 4 yellowred  green  blue
 5 yellowred  green  blue
 6 yellowred   blue green
 head(mydf2)
  blue green red yellow
 14 2   3  1
 22 1   4  3
 34 1   3  2
 44 3   2  1
 54 3   2  1
 63 4   2  1
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Simon Kiss
 Sent: Friday, July 25, 2014 2:34 PM
 To: r-help@r-project.org
 Subject: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
 Frame
 
 Hello:
 I have data that looks like mydf, below.  It is the results of a survey where 
 participants were to put a number of statements (in this case colours) in 
 their order of preference. In this case, the rank number is the variable, and 
 the factor level for each respondent is which colour they assigned to that 
 rank.  I would like to find a way to effectively transpose the data frame so 
 that it looks like mydf2, also below, where the colours the participants were 
 able to choose are the variables and the variable score is what that person 
 ranked that variable. 
 
 Ultimately what I would like to do is a factor analysis on these items, so 
 I'd like to be able to see if people ranked red and yellow higher together 
 but ranked green and blue together lower, that sort of thing.  
 I have played around with different variations of t(), melt(), ifelse() and 
 if() but can't find a solution. 
 Thank you
 Simon
 #Reproducible code
 mydf-data.frame(rank1=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100), rank2=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100), rank3=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100), rank4=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100))
 
 mydf2-data.frame(red=sample(c(1,2,3,4), 
 replace=TRUE,size=100),blue=sample(c(1,2,3,4), 
 replace=TRUE,size=100),green=sample(c(1,2,3,4), replace=TRUE,size=100) 
 ,yellow=sample(c(1,2,3,4), replace=TRUE,size=100))
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-08-15 Thread Simon Kiss

Both the suggestions I got work very well, but what I didn't realize is that NA 
values would cause serious problems.  Where there is a missing value, using the 
argument na.last=NA to order just returns the the order of the factor levels, 
but excludes the missing values, but I have no idea where those occur in the or 
rather which of those variables were actually missing.  
Have I explained this problem sufficiently? 
I didn't think it would cause such a problem so I didn't include it in the 
original problem definition.
Yours, Simon
On Jul 25, 2014, at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote:

 I think this gets what you want. But your data are not reproducible since 
 they are randomly drawn without setting a seed and the two data sets have no 
 relationship to one another.
 
 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
 + green, yellow)
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 mydf2 - data.frame(t(apply(mydf, 1, order)))
 colnames(mydf2) - levels(mydf$rank1)
 head(mydf)
   rank1  rank2  rank3 rank4
 1 yellow  greenred  blue
 2  green   blue yellow   red
 3  green yellowred  blue
 4 yellowred  green  blue
 5 yellowred  green  blue
 6 yellowred   blue green
 head(mydf2)
  blue green red yellow
 14 2   3  1
 22 1   4  3
 34 1   3  2
 44 3   2  1
 54 3   2  1
 63 4   2  1
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Simon Kiss
 Sent: Friday, July 25, 2014 2:34 PM
 To: r-help@r-project.org
 Subject: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
 Frame
 
 Hello:
 I have data that looks like mydf, below.  It is the results of a survey where 
 participants were to put a number of statements (in this case colours) in 
 their order of preference. In this case, the rank number is the variable, and 
 the factor level for each respondent is which colour they assigned to that 
 rank.  I would like to find a way to effectively transpose the data frame so 
 that it looks like mydf2, also below, where the colours the participants were 
 able to choose are the variables and the variable score is what that person 
 ranked that variable. 
 
 Ultimately what I would like to do is a factor analysis on these items, so 
 I'd like to be able to see if people ranked red and yellow higher together 
 but ranked green and blue together lower, that sort of thing.  
 I have played around with different variations of t(), melt(), ifelse() and 
 if() but can't find a solution. 
 Thank you
 Simon
 #Reproducible code
 mydf-data.frame(rank1=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100), rank2=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100), rank3=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100), rank4=sample(c('red', 'blue', 'green', 'yellow'), 
 replace=TRUE, size=100))
 
 mydf2-data.frame(red=sample(c(1,2,3,4), 
 replace=TRUE,size=100),blue=sample(c(1,2,3,4), 
 replace=TRUE,size=100),green=sample(c(1,2,3,4), replace=TRUE,size=100) 
 ,yellow=sample(c(1,2,3,4), replace=TRUE,size=100))
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-07-25 Thread David L Carlson
I think this gets what you want. But your data are not reproducible since they 
are randomly drawn without setting a seed and the two data sets have no 
relationship to one another.

 set.seed(42)
 mydf - data.frame(t(replicate(100, sample(c(red, blue,
+ green, yellow)
 colnames(mydf) - c(rank1, rank2, rank3, rank4)
 mydf2 - data.frame(t(apply(mydf, 1, order)))
 colnames(mydf2) - levels(mydf$rank1)
 head(mydf)
   rank1  rank2  rank3 rank4
1 yellow  greenred  blue
2  green   blue yellow   red
3  green yellowred  blue
4 yellowred  green  blue
5 yellowred  green  blue
6 yellowred   blue green
 head(mydf2)
  blue green red yellow
14 2   3  1
22 1   4  3
34 1   3  2
44 3   2  1
54 3   2  1
63 4   2  1

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Simon Kiss
Sent: Friday, July 25, 2014 2:34 PM
To: r-help@r-project.org
Subject: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame

Hello:
I have data that looks like mydf, below.  It is the results of a survey where 
participants were to put a number of statements (in this case colours) in their 
order of preference. In this case, the rank number is the variable, and the 
factor level for each respondent is which colour they assigned to that rank.  I 
would like to find a way to effectively transpose the data frame so that it 
looks like mydf2, also below, where the colours the participants were able to 
choose are the variables and the variable score is what that person ranked that 
variable. 

Ultimately what I would like to do is a factor analysis on these items, so I'd 
like to be able to see if people ranked red and yellow higher together but 
ranked green and blue together lower, that sort of thing.  
I have played around with different variations of t(), melt(), ifelse() and 
if() but can't find a solution. 
Thank you
Simon
#Reproducible code
mydf-data.frame(rank1=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100), rank2=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100), rank3=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100), rank4=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100))

mydf2-data.frame(red=sample(c(1,2,3,4), 
replace=TRUE,size=100),blue=sample(c(1,2,3,4), 
replace=TRUE,size=100),green=sample(c(1,2,3,4), replace=TRUE,size=100) 
,yellow=sample(c(1,2,3,4), replace=TRUE,size=100))
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.