Re: [R] Sort problem with merge (again)
On Mon, 25 Sep 2006, Bruce LaZerte wrote: # R version 2.3.1 (2006-06-01) Debian Linux testing # Is the following behaviour a bug, feature or just a lack of # understanding on my part? I see that this was discussed here # last March with no apparent resolution. Reference? It is the third alternative. A factor is sorted by its codes: consider x - factor(1:3, levels=as.character(3:1)) x [1] 1 2 3 Levels: 3 2 1 sort(x) [1] 3 2 1 Levels: 3 2 1 and that is what is happening here: for your example the levels of df$Date are levels(df$Date) [1] 1970-04-04 1970-08-11 1970-10-18 1970-06-04 1970-08-18 so the result is sorted correctly. If you want to sort a character column in lexicographic order, don't make it into a factor. Similarly for a date column: use class Date. d - as.factor(c(1970-04-04,1970-08-11,1970-10-18)) x - c(9,10,11) ch - data.frame(Date=d,X=x) d - as.factor(c(1970-06-04,1970-08-11,1970-08-18)) y - c(109,110,111) sp - data.frame(Date=d,Y=y) df - merge(ch,sp,all=TRUE,by=Date) # the rows with dates missing all ch vars are tacked on the end. # the rows with dates missing all sp vars are sorted in with # the row with a date with vars from both ch and sp # is.ordered(df$Date) returns FALSE # The rows of df are not sorted as they should be as sort=TRUE # is the default. Adding sort=TRUE does nothing. # So try this: # dd - df[order(df$Date),] # But that doesn't work. # Nor does sort(df$Date) # But sort(as.vector(df$Date)) does work. # As does order(as.vector(df$Date)), so this works: dd - df[order(as.vector(df$Date)),] # ? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sort problem with merge (again)
# R version 2.3.1 (2006-06-01) Debian Linux testing # Is the following behaviour a bug, feature or just a lack of # understanding on my part? I see that this was discussed here # last March with no apparent resolution. d - as.factor(c(1970-04-04,1970-08-11,1970-10-18)) x - c(9,10,11) ch - data.frame(Date=d,X=x) d - as.factor(c(1970-06-04,1970-08-11,1970-08-18)) y - c(109,110,111) sp - data.frame(Date=d,Y=y) df - merge(ch,sp,all=TRUE,by=Date) # the rows with dates missing all ch vars are tacked on the end. # the rows with dates missing all sp vars are sorted in with # the row with a date with vars from both ch and sp # is.ordered(df$Date) returns FALSE # The rows of df are not sorted as they should be as sort=TRUE # is the default. Adding sort=TRUE does nothing. # So try this: # dd - df[order(df$Date),] # But that doesn't work. # Nor does sort(df$Date) # But sort(as.vector(df$Date)) does work. # As does order(as.vector(df$Date)), so this works: dd - df[order(as.vector(df$Date)),] # ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sort problem with merge (again)
If you want it to act like a date store it as a Date: dx - as.Date(c(1970-04-04,1970-08-11,1970-10-18)) ### x - c(9,10,11) ch - data.frame(Date=dx,X=x) dy - as.Date(c(1970-06-04,1970-08-11,1970-08-18)) ### y - c(109,110,111) sp - data.frame(Date=dy,Y=y) merge(ch, sp, all = TRUE) By the way you might consider using zoo objects here: library(zoo) chz - zoo(x, dx) spz - zoo(y, dy) merge(chz, spz) See: vignette(zoo) On 9/25/06, Bruce LaZerte [EMAIL PROTECTED] wrote: # R version 2.3.1 (2006-06-01) Debian Linux testing # Is the following behaviour a bug, feature or just a lack of # understanding on my part? I see that this was discussed here # last March with no apparent resolution. d - as.factor(c(1970-04-04,1970-08-11,1970-10-18)) x - c(9,10,11) ch - data.frame(Date=d,X=x) d - as.factor(c(1970-06-04,1970-08-11,1970-08-18)) y - c(109,110,111) sp - data.frame(Date=d,Y=y) df - merge(ch,sp,all=TRUE,by=Date) # the rows with dates missing all ch vars are tacked on the end. # the rows with dates missing all sp vars are sorted in with # the row with a date with vars from both ch and sp # is.ordered(df$Date) returns FALSE # The rows of df are not sorted as they should be as sort=TRUE # is the default. Adding sort=TRUE does nothing. # So try this: # dd - df[order(df$Date),] # But that doesn't work. # Nor does sort(df$Date) # But sort(as.vector(df$Date)) does work. # As does order(as.vector(df$Date)), so this works: dd - df[order(as.vector(df$Date)),] # ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sort problem in merge()
Hello! I am merging two datasets and I have encountered a problem with sort. Can someone please point me to my error. Here is the example. ## I have dataframes, first one with factor and second one with factor ## and integer tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0))) tmp2 - data.frame(col1 = factor(c(C, D, E, F)), col2 = 1:4) tmp1 col1 1A 2A 3C 4C 50 60 tmp2 col1 col2 1C1 2D2 3E3 4F4 ## Now merge them (tmp12 - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## As you can see, sort was applied, since row order is not the same as ## in tmp1. Reading help page for ?merge did not reveal much about ## sorting. However I did try to see the result of non-default - ## help page says that order should be the same as in 'y'. So above ## makes sense ## Now merge - but change x an y (tmp21 - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The result is the same. I am stumped here. But looking a bit at these ## object I found something peculiar str(tmp1) `data.frame': 6 obs. of 1 variable: $ col1: Factor w/ 3 levels 0,A,C: 2 2 3 3 1 1 str(tmp2) `data.frame': 4 obs. of 2 variables: $ col1: Factor w/ 4 levels C,D,E,F: 1 2 3 4 $ col2: int 1 2 3 4 str(tmp12) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 3 levels 0,A,C: 3 3 2 2 1 1 $ col2: int 1 1 NA NA NA NA str(tmp21) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 6 levels C,D,E,F,..: 1 1 6 6 5 5 $ col2: int 1 1 NA NA NA NA ## Is it OK, that internal presentation of factors vary between ## different merges. Levels are also different, once only levels ## from original data.frame are used, while in second example all ## levels are propagated. ## I have tried the same with characters tmp1$col1 - as.character(tmp1$col1) tmp2$col1 - as.character(tmp2$col1) (tmp12c - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA (tmp21c - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The same with characters. Is this a bug. It definitely does not agree ## with help page, since order is not the same as in 'y'. Can someone ## please check on newer versions? ## Is there any other way to get the same order as in 'y' i.e. tmp1? R.version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major2 minor2.0 year 2005 month10 day 06 svn rev 35749 language R Thank you very much! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sort problem in merge()
If you make the levels the same does that give what you want: levs - c(LETTERS[1:6], 0) tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2 - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) merge(tmp2, tmp1, all = TRUE, sort = FALSE) merge(tmp1, tmp2, all = TRUE, sort = FALSE) On 3/6/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Hello! I am merging two datasets and I have encountered a problem with sort. Can someone please point me to my error. Here is the example. ## I have dataframes, first one with factor and second one with factor ## and integer tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0))) tmp2 - data.frame(col1 = factor(c(C, D, E, F)), col2 = 1:4) tmp1 col1 1A 2A 3C 4C 50 60 tmp2 col1 col2 1C1 2D2 3E3 4F4 ## Now merge them (tmp12 - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## As you can see, sort was applied, since row order is not the same as ## in tmp1. Reading help page for ?merge did not reveal much about ## sorting. However I did try to see the result of non-default - ## help page says that order should be the same as in 'y'. So above ## makes sense ## Now merge - but change x an y (tmp21 - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The result is the same. I am stumped here. But looking a bit at these ## object I found something peculiar str(tmp1) `data.frame': 6 obs. of 1 variable: $ col1: Factor w/ 3 levels 0,A,C: 2 2 3 3 1 1 str(tmp2) `data.frame': 4 obs. of 2 variables: $ col1: Factor w/ 4 levels C,D,E,F: 1 2 3 4 $ col2: int 1 2 3 4 str(tmp12) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 3 levels 0,A,C: 3 3 2 2 1 1 $ col2: int 1 1 NA NA NA NA str(tmp21) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 6 levels C,D,E,F,..: 1 1 6 6 5 5 $ col2: int 1 1 NA NA NA NA ## Is it OK, that internal presentation of factors vary between ## different merges. Levels are also different, once only levels ## from original data.frame are used, while in second example all ## levels are propagated. ## I have tried the same with characters tmp1$col1 - as.character(tmp1$col1) tmp2$col1 - as.character(tmp2$col1) (tmp12c - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA (tmp21c - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The same with characters. Is this a bug. It definitely does not agree ## with help page, since order is not the same as in 'y'. Can someone ## please check on newer versions? ## Is there any other way to get the same order as in 'y' i.e. tmp1? R.version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major2 minor2.0 year 2005 month10 day 06 svn rev 35749 language R Thank you very much! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sort problem in merge()
Gabor Grothendieck wrote: If you make the levels the same does that give what you want: levs - c(LETTERS[1:6], 0) tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2 - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) merge(tmp2, tmp1, all = TRUE, sort = FALSE) merge(tmp1, tmp2, all = TRUE, sort = FALSE) Gabor thanks for this, but unfortunatelly the result is the same. I get the following via both ways - note that I use all.x or all.y = TRUE. merge(tmp2, tmp1, all.x = TRUE, sort = FALSE) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA But I want this order as it is in tmp 1 col1 1A 2A 3C 4C 50 60 Hello! I am merging two datasets and I have encountered a problem with sort. Can someone please point me to my error. Here is the example. ## I have dataframes, first one with factor and second one with factor ## and integer tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0))) tmp2 - data.frame(col1 = factor(c(C, D, E, F)), col2 = 1:4) tmp1 col1 1A 2A 3C 4C 50 60 tmp2 col1 col2 1C1 2D2 3E3 4F4 ## Now merge them (tmp12 - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## As you can see, sort was applied, since row order is not the same as ## in tmp1. Reading help page for ?merge did not reveal much about ## sorting. However I did try to see the result of non-default - ## help page says that order should be the same as in 'y'. So above ## makes sense ## Now merge - but change x an y (tmp21 - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The result is the same. I am stumped here. But looking a bit at these ## object I found something peculiar str(tmp1) `data.frame': 6 obs. of 1 variable: $ col1: Factor w/ 3 levels 0,A,C: 2 2 3 3 1 1 str(tmp2) `data.frame': 4 obs. of 2 variables: $ col1: Factor w/ 4 levels C,D,E,F: 1 2 3 4 $ col2: int 1 2 3 4 str(tmp12) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 3 levels 0,A,C: 3 3 2 2 1 1 $ col2: int 1 1 NA NA NA NA str(tmp21) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 6 levels C,D,E,F,..: 1 1 6 6 5 5 $ col2: int 1 1 NA NA NA NA ## Is it OK, that internal presentation of factors vary between ## different merges. Levels are also different, once only levels ## from original data.frame are used, while in second example all ## levels are propagated. ## I have tried the same with characters tmp1$col1 - as.character(tmp1$col1) tmp2$col1 - as.character(tmp2$col1) (tmp12c - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA (tmp21c - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The same with characters. Is this a bug. It definitely does not agree ## with help page, since order is not the same as in 'y'. Can someone ## please check on newer versions? ## Is there any other way to get the same order as in 'y' i.e. tmp1? R.version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major2 minor2.0 year 2005 month10 day 06 svn rev 35749 language R Thank you very much! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must
Re: [R] Sort problem in merge()
If believe that merge is trying to put first whatever cells that are nonempty. For example if you instead did tmp2 - data.frame(col1 = factor(c(C, D, E, F,A), levs), col2 = 1:5) tmp2 col1 col2 1C1 2D2 3E3 4F4 5A5 merge(tmp2, tmp1, all.y = TRUE, sort = FALSE) col1 col2 1A5 2A5 3C1 4C1 50 NA 60 NA tmp1 col1 1A 2A 3C 4C 50 60 and if you do this tmp1 - data.frame(col1 = factor(c(0, 0, C, C, A, A), levs)) merge(tmp2, tmp1, all.y = TRUE, sort = FALSE) col1 col2 1C1 2C1 3A5 4A5 50 NA 60 NA So I think it is doing what you want it to do. Jean Gregor Gorjanc wrote: Gabor Grothendieck wrote: If you make the levels the same does that give what you want: levs - c(LETTERS[1:6], 0) tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2 - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) merge(tmp2, tmp1, all = TRUE, sort = FALSE) merge(tmp1, tmp2, all = TRUE, sort = FALSE) Gabor thanks for this, but unfortunatelly the result is the same. I get the following via both ways - note that I use all.x or all.y = TRUE. merge(tmp2, tmp1, all.x = TRUE, sort = FALSE) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA But I want this order as it is in tmp 1 col1 1A 2A 3C 4C 50 60 Hello! I am merging two datasets and I have encountered a problem with sort. Can someone please point me to my error. Here is the example. ## I have dataframes, first one with factor and second one with factor ## and integer tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0))) tmp2 - data.frame(col1 = factor(c(C, D, E, F)), col2 = 1:4) tmp1 col1 1A 2A 3C 4C 50 60 tmp2 col1 col2 1C1 2D2 3E3 4F4 ## Now merge them (tmp12 - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## As you can see, sort was applied, since row order is not the same as ## in tmp1. Reading help page for ?merge did not reveal much about ## sorting. However I did try to see the result of non-default - ## help page says that order should be the same as in 'y'. So above ## makes sense ## Now merge - but change x an y (tmp21 - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The result is the same. I am stumped here. But looking a bit at these ## object I found something peculiar str(tmp1) `data.frame': 6 obs. of 1 variable: $ col1: Factor w/ 3 levels 0,A,C: 2 2 3 3 1 1 str(tmp2) `data.frame': 4 obs. of 2 variables: $ col1: Factor w/ 4 levels C,D,E,F: 1 2 3 4 $ col2: int 1 2 3 4 str(tmp12) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 3 levels 0,A,C: 3 3 2 2 1 1 $ col2: int 1 1 NA NA NA NA str(tmp21) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 6 levels C,D,E,F,..: 1 1 6 6 5 5 $ col2: int 1 1 NA NA NA NA ## Is it OK, that internal presentation of factors vary between ## different merges. Levels are also different, once only levels ## from original data.frame are used, while in second example all ## levels are propagated. ## I have tried the same with characters tmp1$col1 - as.character(tmp1$col1) tmp2$col1 - as.character(tmp2$col1) (tmp12c - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA (tmp21c - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The same with characters. Is this a bug. It definitely does not agree ## with help page, since order is not the same as in 'y'. Can someone ## please check on newer versions? ## Is there any other way to get the same order as in 'y' i.e. tmp1? R.version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major2 minor2.0 year 2005 month10 day 06 svn rev 35749 language R Thank you very much! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386
Re: [R] Sort problem in merge()
I think you will need to reorder it: out - merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE, sort = FALSE) out[out$seq, -2] On 3/6/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Gabor Grothendieck wrote: If you make the levels the same does that give what you want: levs - c(LETTERS[1:6], 0) tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2 - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) merge(tmp2, tmp1, all = TRUE, sort = FALSE) merge(tmp1, tmp2, all = TRUE, sort = FALSE) Gabor thanks for this, but unfortunatelly the result is the same. I get the following via both ways - note that I use all.x or all.y = TRUE. merge(tmp2, tmp1, all.x = TRUE, sort = FALSE) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA But I want this order as it is in tmp 1 col1 1A 2A 3C 4C 50 60 Hello! I am merging two datasets and I have encountered a problem with sort. Can someone please point me to my error. Here is the example. ## I have dataframes, first one with factor and second one with factor ## and integer tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0))) tmp2 - data.frame(col1 = factor(c(C, D, E, F)), col2 = 1:4) tmp1 col1 1A 2A 3C 4C 50 60 tmp2 col1 col2 1C1 2D2 3E3 4F4 ## Now merge them (tmp12 - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## As you can see, sort was applied, since row order is not the same as ## in tmp1. Reading help page for ?merge did not reveal much about ## sorting. However I did try to see the result of non-default - ## help page says that order should be the same as in 'y'. So above ## makes sense ## Now merge - but change x an y (tmp21 - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The result is the same. I am stumped here. But looking a bit at these ## object I found something peculiar str(tmp1) `data.frame': 6 obs. of 1 variable: $ col1: Factor w/ 3 levels 0,A,C: 2 2 3 3 1 1 str(tmp2) `data.frame': 4 obs. of 2 variables: $ col1: Factor w/ 4 levels C,D,E,F: 1 2 3 4 $ col2: int 1 2 3 4 str(tmp12) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 3 levels 0,A,C: 3 3 2 2 1 1 $ col2: int 1 1 NA NA NA NA str(tmp21) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 6 levels C,D,E,F,..: 1 1 6 6 5 5 $ col2: int 1 1 NA NA NA NA ## Is it OK, that internal presentation of factors vary between ## different merges. Levels are also different, once only levels ## from original data.frame are used, while in second example all ## levels are propagated. ## I have tried the same with characters tmp1$col1 - as.character(tmp1$col1) tmp2$col1 - as.character(tmp2$col1) (tmp12c - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA (tmp21c - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The same with characters. Is this a bug. It definitely does not agree ## with help page, since order is not the same as in 'y'. Can someone ## please check on newer versions? ## Is there any other way to get the same order as in 'y' i.e. tmp1? R.version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major2 minor2.0 year 2005 month10 day 06 svn rev 35749 language R Thank you very much! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Lep pozdrav / With regards, Gregor Gorjanc --
Re: [R] Sort problem in merge()
Actually we don't need sort = FALSE if we are reordering it anyways: out - merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE) out[out$seq, -2] On 3/6/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: I think you will need to reorder it: out - merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE, sort = FALSE) out[out$seq, -2] On 3/6/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Gabor Grothendieck wrote: If you make the levels the same does that give what you want: levs - c(LETTERS[1:6], 0) tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2 - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) merge(tmp2, tmp1, all = TRUE, sort = FALSE) merge(tmp1, tmp2, all = TRUE, sort = FALSE) Gabor thanks for this, but unfortunatelly the result is the same. I get the following via both ways - note that I use all.x or all.y = TRUE. merge(tmp2, tmp1, all.x = TRUE, sort = FALSE) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA But I want this order as it is in tmp 1 col1 1A 2A 3C 4C 50 60 Hello! I am merging two datasets and I have encountered a problem with sort. Can someone please point me to my error. Here is the example. ## I have dataframes, first one with factor and second one with factor ## and integer tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0))) tmp2 - data.frame(col1 = factor(c(C, D, E, F)), col2 = 1:4) tmp1 col1 1A 2A 3C 4C 50 60 tmp2 col1 col2 1C1 2D2 3E3 4F4 ## Now merge them (tmp12 - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## As you can see, sort was applied, since row order is not the same as ## in tmp1. Reading help page for ?merge did not reveal much about ## sorting. However I did try to see the result of non-default - ## help page says that order should be the same as in 'y'. So above ## makes sense ## Now merge - but change x an y (tmp21 - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The result is the same. I am stumped here. But looking a bit at these ## object I found something peculiar str(tmp1) `data.frame': 6 obs. of 1 variable: $ col1: Factor w/ 3 levels 0,A,C: 2 2 3 3 1 1 str(tmp2) `data.frame': 4 obs. of 2 variables: $ col1: Factor w/ 4 levels C,D,E,F: 1 2 3 4 $ col2: int 1 2 3 4 str(tmp12) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 3 levels 0,A,C: 3 3 2 2 1 1 $ col2: int 1 1 NA NA NA NA str(tmp21) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 6 levels C,D,E,F,..: 1 1 6 6 5 5 $ col2: int 1 1 NA NA NA NA ## Is it OK, that internal presentation of factors vary between ## different merges. Levels are also different, once only levels ## from original data.frame are used, while in second example all ## levels are propagated. ## I have tried the same with characters tmp1$col1 - as.character(tmp1$col1) tmp2$col1 - as.character(tmp2$col1) (tmp12c - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA (tmp21c - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The same with characters. Is this a bug. It definitely does not agree ## with help page, since order is not the same as in 'y'. Can someone ## please check on newer versions? ## Is there any other way to get the same order as in 'y' i.e. tmp1? R.version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major2 minor2.0 year 2005 month10 day 06 svn rev 35749 language R Thank you very much! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it,
Re: [R] Sort problem in merge()
Gabor and Jean thank you for your time and answers. Gabors approach does not do what I want (with or without sort). Gabor note that when we merge data.frame, this new data.frame gets new row.names and we can not be consistent with sort. out - merge(cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE) col1 seq col2 10 5 NA 20 6 NA 3A 1 NA 4A 2 NA 5C 31 6C 41 out[out$seq, -2] col1 col2 5C1 6C1 10 NA 20 NA 3A NA 4A NA out - merge(cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE, sort = TRUE) col1 seq col2 10 5 NA 20 6 NA 3A 1 NA 4A 2 NA 5C 31 6C 41 out[out$seq, -2] col1 col2 5C1 6C1 10 NA 20 NA 3A NA 4A NA But I want to get out A NA A NA C 1 C 1 0 NA 0 NA i.e. with the same order as in tmp1. I really need the same order, since I will cbind this data frame to another one and I need to keep the order intact. I am quite confident that this points to a bug in merge code or at least in merge documentation. NA's seem to introduce problems as showed by Jean. Can someone (R core) also confirm this? Gabor Grothendieck wrote: Actually we don't need sort = FALSE if we are reordering it anyways: out - merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE) out[out$seq, -2] On 3/6/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: I think you will need to reorder it: out - merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE, sort = FALSE) out[out$seq, -2] On 3/6/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Gabor Grothendieck wrote: If you make the levels the same does that give what you want: levs - c(LETTERS[1:6], 0) tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2 - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) merge(tmp2, tmp1, all = TRUE, sort = FALSE) merge(tmp1, tmp2, all = TRUE, sort = FALSE) Gabor thanks for this, but unfortunatelly the result is the same. I get the following via both ways - note that I use all.x or all.y = TRUE. merge(tmp2, tmp1, all.x = TRUE, sort = FALSE) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA But I want this order as it is in tmp 1 col1 1A 2A 3C 4C 50 60 Hello! I am merging two datasets and I have encountered a problem with sort. Can someone please point me to my error. Here is the example. ## I have dataframes, first one with factor and second one with factor ## and integer tmp1 - data.frame(col1 = factor(c(A, A, C, C, 0, 0))) tmp2 - data.frame(col1 = factor(c(C, D, E, F)), col2 = 1:4) tmp1 col1 1A 2A 3C 4C 50 60 tmp2 col1 col2 1C1 2D2 3E3 4F4 ## Now merge them (tmp12 - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## As you can see, sort was applied, since row order is not the same as ## in tmp1. Reading help page for ?merge did not reveal much about ## sorting. However I did try to see the result of non-default - ## help page says that order should be the same as in 'y'. So above ## makes sense ## Now merge - but change x an y (tmp21 - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The result is the same. I am stumped here. But looking a bit at these ## object I found something peculiar str(tmp1) `data.frame': 6 obs. of 1 variable: $ col1: Factor w/ 3 levels 0,A,C: 2 2 3 3 1 1 str(tmp2) `data.frame': 4 obs. of 2 variables: $ col1: Factor w/ 4 levels C,D,E,F: 1 2 3 4 $ col2: int 1 2 3 4 str(tmp12) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 3 levels 0,A,C: 3 3 2 2 1 1 $ col2: int 1 1 NA NA NA NA str(tmp21) `data.frame': 6 obs. of 2 variables: $ col1: Factor w/ 6 levels C,D,E,F,..: 1 1 6 6 5 5 $ col2: int 1 1 NA NA NA NA ## Is it OK, that internal presentation of factors vary between ## different merges. Levels are also different, once only levels ## from original data.frame are used, while in second example all ## levels are propagated. ## I have tried the same with characters tmp1$col1 - as.character(tmp1$col1) tmp2$col1 - as.character(tmp2$col1) (tmp12c - merge(tmp1, tmp2, by.x = col1, by.y = col1, all.x = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA (tmp21c - merge(tmp2, tmp1, by.x = col1, by.y = col1, all.y = TRUE, sort = FALSE)) col1 col2 1C1 2C1 3A NA 4A NA 50 NA 60 NA ## The same with characters. Is this a bug. It definitely does not agree ## with help page, since order is not the same as in 'y'. Can someone ## please check on newer
Re: [R] Sort problem in merge()
On 3/6/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: But I want to get out A NA A NA C 1 C 1 0 NA 0 NA That's what I get except for the rownames. Be sure to make the factor levels consistent. I have renamed the data frames tmp1a and tmp2a to distinguish them from the ones in your post and have also reset the rownames to be the original ones, as requested, so that the following is self contained and should be reproducible: levs - c(LETTERS[1:6], 0) tmp1a - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2a - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) outa - merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE) outa - outa[out$seq, -2] rownames(outa) - rownames(tmp1a) outa col1 col2 10 NA 20 NA 3A NA 4A NA 5C1 6C1 R.version.string # Windows XP [1] R version 2.2.1, 2005-12-20 By the way, the main limitation with this approach is that the elements of tmp2$col1 be unique so that the result has rows which correspond to those of tmp1; however, that seems to be the case here. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sort problem in merge()
Sorry, I mixed up out and outa in the last post. Here it is correctly. levs - c(LETTERS[1:6], 0) tmp1a - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2a - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) out - merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE) out - out[out$seq, -2] rownames(out) - rownames(tmp1a) out col1 col2 1A NA 2A NA 3C1 4C1 50 NA 60 NA On 3/6/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 3/6/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: But I want to get out A NA A NA C 1 C 1 0 NA 0 NA That's what I get except for the rownames. Be sure to make the factor levels consistent. I have renamed the data frames tmp1a and tmp2a to distinguish them from the ones in your post and have also reset the rownames to be the original ones, as requested, so that the following is self contained and should be reproducible: levs - c(LETTERS[1:6], 0) tmp1a - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2a - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) outa - merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE) outa - outa[out$seq, -2] rownames(outa) - rownames(tmp1a) outa col1 col2 10 NA 20 NA 3A NA 4A NA 5C1 6C1 R.version.string # Windows XP [1] R version 2.2.1, 2005-12-20 By the way, the main limitation with this approach is that the elements of tmp2$col1 be unique so that the result has rows which correspond to those of tmp1; however, that seems to be the case here. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sort problem in merge()
One other idea; one could use match instead of merge: # tmp1a and tmp2a from below cbind(tmp1a, tmp2a[match(tmp1a$col1, tmp2a$col1), -1, drop = FALSE]) col1 col2 1A NA 2A NA 3C1 4C1 50 NA 60 NA This avoids having to muck with reordering of rows and reseting of rownames. Like the prior solution, it assumes that the elements of tmp2a$col1 are unique. On 3/6/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Sorry, I mixed up out and outa in the last post. Here it is correctly. levs - c(LETTERS[1:6], 0) tmp1a - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2a - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) out - merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE) out - out[out$seq, -2] rownames(out) - rownames(tmp1a) out col1 col2 1A NA 2A NA 3C1 4C1 50 NA 60 NA On 3/6/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 3/6/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: But I want to get out A NA A NA C 1 C 1 0 NA 0 NA That's what I get except for the rownames. Be sure to make the factor levels consistent. I have renamed the data frames tmp1a and tmp2a to distinguish them from the ones in your post and have also reset the rownames to be the original ones, as requested, so that the following is self contained and should be reproducible: levs - c(LETTERS[1:6], 0) tmp1a - data.frame(col1 = factor(c(A, A, C, C, 0, 0), levs)) tmp2a - data.frame(col1 = factor(c(C, D, E, F), levs), col2 = 1:4) outa - merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE) outa - outa[out$seq, -2] rownames(outa) - rownames(tmp1a) outa col1 col2 10 NA 20 NA 3A NA 4A NA 5C1 6C1 R.version.string # Windows XP [1] R version 2.2.1, 2005-12-20 By the way, the main limitation with this approach is that the elements of tmp2$col1 be unique so that the result has rows which correspond to those of tmp1; however, that seems to be the case here. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html