Re: [R] comparing 2 dataframes
Hi The problem is I'm first connecting to the Access database with odbcConnectAccess and then select with a sqlQuery the dataframe. In your solution you are typing it. But mine databases consist of approximately 6 records. Maybe you have another solution? Thanks in advance. Regards, Priya On 11/7/06, Christoph Buser [EMAIL PROTECTED] wrote: Hi Maybe this example can help you to find your solution: dat1 - data.frame(CUSTOMER_ID = c(1000786BR, 1002047BR, 10127BR, 1004166834BR, 1004310897BR, 1006180BR, 10064798BR, 1007311BR, 1007621BR, 1008195BR, 10126BR, 95323994BR), CUSTOMER_RR = c(5+, 4, 5+, 2, X, 4, 4, 5+, 4, 4-, 5+, 4)) dat2 - data.frame(CUSTOMER_ID = c(1200786BR, 1802047BR, 1027BR, 10166834BR, 107BR, 100BR, 164798BR, 1008195BR, 10126BR), CUSTOMER_RR = c(6+, 4, 1+, 2, X, 4, 4, 4, 5+)) ## Merge, but only by CUSTOMER_ID datM - merge(dat1, dat2, by = CUSTOMER_ID) datM ## Select only cases that have a similar CUSTOMER_RR datM1 - datM[as.character(datM[, CUSTOMER_RR.x]) %in% as.character(datM[,CUSTOMER_RR.y]), ] datM1 Regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser [EMAIL PROTECTED] Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Priya Kanhai writes: Hi, I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of different length. For example: RRC_db1: CUSTOMER_ID CUSTOMER_RR 1 1000786BR 5+ 2 1002047BR4 3 10127BR 5+ 4 1004166834BR2 5 1004310897BRX 6 1006180BR4 710064798BR4 8 1007311BR 5+ 9 1007621BR4 101008195BR 4- 11 10126BR 5+ 12 95323994BR4 RRC_db2: CUSTOMER_ID CUSTOMER_RR 1 1200786BR 6+ 2 1802047BR4 3 1027BR 1+ 4 10166834BR2 5 107BR X 6 100BR4 7164798BR4 81008195BR 4- 9 10126BR 5+ I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2: third - merge(RRC_db1, RRC_db2) or third -subset(RRC_db1, CUSTOMER_ID%in% RRC_db2$CUSTOMER_ID) But I also want to check if the CUSTOMER_RR is correct. I had tried this: test - function(RRC_db1,RRC_db2) + { + noteq - c() + for( i in 1:length(RRC_db1$CUSTOMER_ID)){ + for( j in 1:length(RRC_db2$CUSTOMER_ID)){ + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){ + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){ + noteq - c(noteq,RRC_db1$CUSTOMER_ID[i]); + } + } + } + } + noteq; + } test(RRC_db1, RRC_db2) Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) : level sets of factors are different But then I got this error. I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR. Can you please help me? Thanks in advance. Regards, Priya [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing 2 dataframes
Hi, I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of different length. For example: RRC_db1: CUSTOMER_ID CUSTOMER_RR 1 1000786BR 5+ 2 1002047BR4 3 10127BR 5+ 4 1004166834BR2 5 1004310897BRX 6 1006180BR4 710064798BR4 8 1007311BR 5+ 9 1007621BR4 101008195BR 4- 11 10126BR 5+ 12 95323994BR4 RRC_db2: CUSTOMER_ID CUSTOMER_RR 1 1200786BR 6+ 2 1802047BR4 3 1027BR 1+ 4 10166834BR2 5 107BR X 6 100BR4 7164798BR4 81008195BR 4- 9 10126BR 5+ I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2: third - merge(RRC_db1, RRC_db2) or third -subset(RRC_db1, CUSTOMER_ID%in% RRC_db2$CUSTOMER_ID) But I also want to check if the CUSTOMER_RR is correct. I had tried this: test - function(RRC_db1,RRC_db2) + { + noteq - c() + for( i in 1:length(RRC_db1$CUSTOMER_ID)){ + for( j in 1:length(RRC_db2$CUSTOMER_ID)){ + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){ + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){ + noteq - c(noteq,RRC_db1$CUSTOMER_ID[i]); + } + } + } + } + noteq; + } test(RRC_db1, RRC_db2) Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) : level sets of factors are different But then I got this error. I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR. Can you please help me? Thanks in advance. Regards, Priya [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing 2 dataframes
Hi Maybe this example can help you to find your solution: dat1 - data.frame(CUSTOMER_ID = c(1000786BR, 1002047BR, 10127BR, 1004166834BR, 1004310897BR, 1006180BR, 10064798BR, 1007311BR, 1007621BR, 1008195BR, 10126BR, 95323994BR), CUSTOMER_RR = c(5+, 4, 5+, 2, X, 4, 4, 5+, 4, 4-, 5+, 4)) dat2 - data.frame(CUSTOMER_ID = c(1200786BR, 1802047BR, 1027BR, 10166834BR, 107BR, 100BR, 164798BR, 1008195BR, 10126BR), CUSTOMER_RR = c(6+, 4, 1+, 2, X, 4, 4, 4, 5+)) ## Merge, but only by CUSTOMER_ID datM - merge(dat1, dat2, by = CUSTOMER_ID) datM ## Select only cases that have a similar CUSTOMER_RR datM1 - datM[as.character(datM[, CUSTOMER_RR.x]) %in% as.character(datM[,CUSTOMER_RR.y]), ] datM1 Regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser [EMAIL PROTECTED] Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Priya Kanhai writes: Hi, I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of different length. For example: RRC_db1: CUSTOMER_ID CUSTOMER_RR 1 1000786BR 5+ 2 1002047BR4 3 10127BR 5+ 4 1004166834BR2 5 1004310897BRX 6 1006180BR4 710064798BR4 8 1007311BR 5+ 9 1007621BR4 101008195BR 4- 11 10126BR 5+ 12 95323994BR4 RRC_db2: CUSTOMER_ID CUSTOMER_RR 1 1200786BR 6+ 2 1802047BR4 3 1027BR 1+ 4 10166834BR2 5 107BR X 6 100BR4 7164798BR4 81008195BR 4- 9 10126BR 5+ I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2: third - merge(RRC_db1, RRC_db2) or third -subset(RRC_db1, CUSTOMER_ID%in% RRC_db2$CUSTOMER_ID) But I also want to check if the CUSTOMER_RR is correct. I had tried this: test - function(RRC_db1,RRC_db2) + { + noteq - c() + for( i in 1:length(RRC_db1$CUSTOMER_ID)){ + for( j in 1:length(RRC_db2$CUSTOMER_ID)){ + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){ + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){ + noteq - c(noteq,RRC_db1$CUSTOMER_ID[i]); + } + } + } + } + noteq; + } test(RRC_db1, RRC_db2) Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) : level sets of factors are different But then I got this error. I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR. Can you please help me? Thanks in advance. Regards, Priya [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.