[R] configure ddply() to avoid reordering of '.variables'

2013-05-27 Thread Liviu Andronic
Hello,
I'm using ddply() in plyr and I notice that it has the habit of
re-ordering the levels of the '.variables' by which the splitting is
done. I'm concerned about correctly retrieving the original ordering.

Consider:
require(plyr)
x - iris[ order(iris$Species, decreasing=T), ]
head(x)
#Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#101  6.3 3.3  6.0 2.5 virginica
#102  5.8 2.7  5.1 1.9 virginica
#103  7.1 3.0  5.9 2.1 virginica
#104  6.3 2.9  5.6 1.8 virginica
#105  6.5 3.0  5.8 2.2 virginica
#106  7.6 3.0  6.6 2.1 virginica
xa - ddply(x, .(Species), function(x)
{data.frame(Sepal.Length=x$Sepal.Length, mean.adj=(x$Sepal.Length -
mean(x$Sepal.Length)))})
#  
|==|
100%
##notice how the ordering of Species is different
##from that in the input data frame
head(xa)
#  Species Sepal.Length mean.adj
#1  setosa  5.10.094
#2  setosa  4.9   -0.106
#3  setosa  4.7   -0.306
#4  setosa  4.6   -0.406
#5  setosa  5.0   -0.006
#6  setosa  5.40.394
all.equal(xa$Species, x$Species)
#[1] 100 string mismatches
all.equal(xa[ order(xa$Species, decreasing=T), ]$Species, x$Species)
#[1] TRUE
all.equal(xa$Sepal.Length, x$Sepal.Length)
#[1] Mean relative difference: 0.2785
all.equal(xa[ order(xa$Species, decreasing=T), ]$Sepal.Length, x$Sepal.Length)
#[1] TRUE

In my real data, should I be concerned that simply reordering by the
'.variables' variable wouldn't necessarily restore the original
ordering as in the input data frame? Is it possible to instruct
ddply() to avoid re-ordering the supplied '.variables' variable?

Regards,
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] configure ddply() to avoid reordering of '.variables'

2013-05-27 Thread arun
May be this helps 

levels(x$Species)
#[1] setosa versicolor virginica 
x$Species- factor(x$Species,levels=unique(x$Species))
xa - ddply(x, .(Species), function(x)
 {data.frame(Sepal.Length=x$Sepal.Length, mean.adj=(x$Sepal.Length -
 mean(x$Sepal.Length)))})
 head(xa)
#    Species Sepal.Length mean.adj
#1 virginica  6.3   -0.288
#2 virginica  5.8   -0.788
#3 virginica  7.1    0.512
#4 virginica  6.3   -0.288
#5 virginica  6.5   -0.088
#6 virginica  7.6    1.012


A.K.


- Original Message -
From: Liviu Andronic landronim...@gmail.com
To: r-help@r-project.org Help r-help@r-project.org
Cc: 
Sent: Monday, May 27, 2013 4:47 AM
Subject: [R] configure ddply() to avoid reordering of '.variables'

Hello,
I'm using ddply() in plyr and I notice that it has the habit of
re-ordering the levels of the '.variables' by which the splitting is
done. I'm concerned about correctly retrieving the original ordering.

Consider:
require(plyr)
x - iris[ order(iris$Species, decreasing=T), ]
head(x)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#101          6.3         3.3          6.0         2.5 virginica
#102          5.8         2.7          5.1         1.9 virginica
#103          7.1         3.0          5.9         2.1 virginica
#104          6.3         2.9          5.6         1.8 virginica
#105          6.5         3.0          5.8         2.2 virginica
#106          7.6         3.0          6.6         2.1 virginica
xa - ddply(x, .(Species), function(x)
{data.frame(Sepal.Length=x$Sepal.Length, mean.adj=(x$Sepal.Length -
mean(x$Sepal.Length)))})
#  
|==|
100%
##notice how the ordering of Species is different
##from that in the input data frame
head(xa)
#  Species Sepal.Length mean.adj
#1  setosa          5.1    0.094
#2  setosa          4.9   -0.106
#3  setosa          4.7   -0.306
#4  setosa          4.6   -0.406
#5  setosa          5.0   -0.006
#6  setosa          5.4    0.394
all.equal(xa$Species, x$Species)
#[1] 100 string mismatches
all.equal(xa[ order(xa$Species, decreasing=T), ]$Species, x$Species)
#[1] TRUE
all.equal(xa$Sepal.Length, x$Sepal.Length)
#[1] Mean relative difference: 0.2785
all.equal(xa[ order(xa$Species, decreasing=T), ]$Sepal.Length, x$Sepal.Length)
#[1] TRUE

In my real data, should I be concerned that simply reordering by the
'.variables' variable wouldn't necessarily restore the original
ordering as in the input data frame? Is it possible to instruct
ddply() to avoid re-ordering the supplied '.variables' variable?

Regards,
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] configure ddply() to avoid reordering of '.variables'

2013-05-27 Thread arun
Also,
you can check:
http://stackoverflow.com/questions/7235421/how-to-ddply-without-sorting


keeping.order - function(data, fn, ...) { 
  col - .sortColumn
  data[,col] - 1:nrow(data) 
  out - fn(data, ...) 
  if (!col %in% colnames(out)) stop(Ordering column not preserved by 
function) 
  out - out[order(out[,col]),] 
  out[,col] - NULL 
  out 
}
x - iris[ order(iris$Species, decreasing=T), ]
xa- 
ddply(x,.(Species),mutate,mean.adj=Sepal.Length-mean(Sepal.Length))[-c(2:4)]
xa1- 
keeping.order(x,ddply,.(Species),mutate,mean.adj=Sepal.Length-mean(Sepal.Length))[-c(2:4)]
 head(xa1)
#    Sepal.Length   Species mean.adj
#101  6.3 virginica   -0.288
#102  5.8 virginica   -0.788
#103  7.1 virginica    0.512
#104  6.3 virginica   -0.288
#105  6.5 virginica   -0.088
#106  7.6 virginica    1.012
A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: Liviu Andronic landronim...@gmail.com
Cc: R help r-help@r-project.org
Sent: Monday, May 27, 2013 10:06 AM
Subject: Re: [R] configure ddply() to avoid reordering of '.variables'

May be this helps 

levels(x$Species)
#[1] setosa versicolor virginica 
x$Species- factor(x$Species,levels=unique(x$Species))
xa - ddply(x, .(Species), function(x)
 {data.frame(Sepal.Length=x$Sepal.Length, mean.adj=(x$Sepal.Length -
 mean(x$Sepal.Length)))})
 head(xa)
#    Species Sepal.Length mean.adj
#1 virginica  6.3   -0.288
#2 virginica  5.8   -0.788
#3 virginica  7.1    0.512
#4 virginica  6.3   -0.288
#5 virginica  6.5   -0.088
#6 virginica  7.6    1.012


A.K.


- Original Message -
From: Liviu Andronic landronim...@gmail.com
To: r-help@r-project.org Help r-help@r-project.org
Cc: 
Sent: Monday, May 27, 2013 4:47 AM
Subject: [R] configure ddply() to avoid reordering of '.variables'

Hello,
I'm using ddply() in plyr and I notice that it has the habit of
re-ordering the levels of the '.variables' by which the splitting is
done. I'm concerned about correctly retrieving the original ordering.

Consider:
require(plyr)
x - iris[ order(iris$Species, decreasing=T), ]
head(x)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#101          6.3         3.3          6.0         2.5 virginica
#102          5.8         2.7          5.1         1.9 virginica
#103          7.1         3.0          5.9         2.1 virginica
#104          6.3         2.9          5.6         1.8 virginica
#105          6.5         3.0          5.8         2.2 virginica
#106          7.6         3.0          6.6         2.1 virginica
xa - ddply(x, .(Species), function(x)
{data.frame(Sepal.Length=x$Sepal.Length, mean.adj=(x$Sepal.Length -
mean(x$Sepal.Length)))})
#  
|==|
100%
##notice how the ordering of Species is different
##from that in the input data frame
head(xa)
#  Species Sepal.Length mean.adj
#1  setosa          5.1    0.094
#2  setosa          4.9   -0.106
#3  setosa          4.7   -0.306
#4  setosa          4.6   -0.406
#5  setosa          5.0   -0.006
#6  setosa          5.4    0.394
all.equal(xa$Species, x$Species)
#[1] 100 string mismatches
all.equal(xa[ order(xa$Species, decreasing=T), ]$Species, x$Species)
#[1] TRUE
all.equal(xa$Sepal.Length, x$Sepal.Length)
#[1] Mean relative difference: 0.2785
all.equal(xa[ order(xa$Species, decreasing=T), ]$Sepal.Length, x$Sepal.Length)
#[1] TRUE

In my real data, should I be concerned that simply reordering by the
'.variables' variable wouldn't necessarily restore the original
ordering as in the input data frame? Is it possible to instruct
ddply() to avoid re-ordering the supplied '.variables' variable?

Regards,
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.