Re: [R] comparing 2 dataframes

2006-11-07 Thread Priya Kanhai
Hi

The problem is I'm first connecting to the Access database with
odbcConnectAccess and then select with a sqlQuery the dataframe.
In your solution you are typing it. But mine databases consist of
approximately 6 records.

Maybe you have another solution? Thanks in advance.

Regards,

Priya

On 11/7/06, Christoph Buser [EMAIL PROTECTED] wrote:

 Hi

 Maybe this example can help you to find your solution:

 dat1 - data.frame(CUSTOMER_ID = c(1000786BR, 1002047BR, 10127BR,
  1004166834BR, 1004310897BR, 1006180BR,
  10064798BR, 1007311BR, 1007621BR,
  1008195BR, 10126BR, 95323994BR),
CUSTOMER_RR = c(5+, 4, 5+, 2, X, 4, 4,
 5+,
  4, 4-, 5+, 4))

 dat2 - data.frame(CUSTOMER_ID = c(1200786BR, 1802047BR, 1027BR,
  10166834BR, 107BR, 100BR, 164798BR,
 1008195BR,
  10126BR),
CUSTOMER_RR = c(6+, 4, 1+, 2, X, 4, 4,
 4,
  5+))

 ## Merge, but only by CUSTOMER_ID
 datM - merge(dat1, dat2, by = CUSTOMER_ID)
 datM
 ## Select only cases that have a similar CUSTOMER_RR
 datM1 - datM[as.character(datM[, CUSTOMER_RR.x]) %in%
   as.character(datM[,CUSTOMER_RR.y]), ]
 datM1

 Regards,

 Christoph

 --

 Credit and Surety PML study: visit our web page www.cs-pml.org

 --
 Christoph Buser [EMAIL PROTECTED]
 Seminar fuer Statistik, LEO C13
 ETH Zurich  8092 Zurich  SWITZERLAND
 phone: x-41-44-632-4673 fax: 632-1228
 http://stat.ethz.ch/~buser/
 --



 Priya Kanhai writes:
  Hi,
 
  I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
  different length.
 
  For example:
 
  RRC_db1:
 
  CUSTOMER_ID CUSTOMER_RR
  1 1000786BR   5+
  2 1002047BR4
  3   10127BR   5+
  4  1004166834BR2
  5  1004310897BRX
  6 1006180BR4
  710064798BR4
  8 1007311BR   5+
  9 1007621BR4
  101008195BR   4-
  11  10126BR   5+
  12   95323994BR4
 
   RRC_db2:
 
  CUSTOMER_ID CUSTOMER_RR
  1 1200786BR   6+
  2 1802047BR4
  3  1027BR 1+
  4   10166834BR2
  5   107BR  X
  6 100BR4
  7164798BR4
  81008195BR   4-
  9  10126BR   5+
 
 
  I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
  third - merge(RRC_db1, RRC_db2) or  third -subset(RRC_db1,
 CUSTOMER_ID%in%
  RRC_db2$CUSTOMER_ID)
 
  But I also want to check if the CUSTOMER_RR is correct. I had tried
 this:
 
   test - function(RRC_db1,RRC_db2)
  + {
  + noteq - c()
  + for( i in 1:length(RRC_db1$CUSTOMER_ID)){
  + for( j in 1:length(RRC_db2$CUSTOMER_ID)){
  + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
  + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
  + noteq - c(noteq,RRC_db1$CUSTOMER_ID[i]);
  + }
  + }
  + }
  + }
  + noteq;
  + }
  
   test(RRC_db1, RRC_db2)
  Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
  level sets of factors are different
 
 
  But then I got this error.
 
  I don't only want the CUSTOMER_ID to be the same but also the
 CUSTOMER_RR.
 
  Can you please help me?
 
  Thanks in advance.
 
  Regards,
 
  Priya
 
   [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparing 2 dataframes

2006-11-06 Thread Priya Kanhai
Hi,

I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
different length.

For example:

RRC_db1:

CUSTOMER_ID CUSTOMER_RR
1 1000786BR   5+
2 1002047BR4
3   10127BR   5+
4  1004166834BR2
5  1004310897BRX
6 1006180BR4
710064798BR4
8 1007311BR   5+
9 1007621BR4
101008195BR   4-
11  10126BR   5+
12   95323994BR4

 RRC_db2:

CUSTOMER_ID CUSTOMER_RR
1 1200786BR   6+
2 1802047BR4
3  1027BR 1+
4   10166834BR2
5   107BR  X
6 100BR4
7164798BR4
81008195BR   4-
9  10126BR   5+


I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
third - merge(RRC_db1, RRC_db2) or  third -subset(RRC_db1, CUSTOMER_ID%in%
RRC_db2$CUSTOMER_ID)

But I also want to check if the CUSTOMER_RR is correct. I had tried this:

 test - function(RRC_db1,RRC_db2)
+ {
+ noteq - c()
+ for( i in 1:length(RRC_db1$CUSTOMER_ID)){
+ for( j in 1:length(RRC_db2$CUSTOMER_ID)){
+ if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
+ if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
+ noteq - c(noteq,RRC_db1$CUSTOMER_ID[i]);
+ }
+ }
+ }
+ }
+ noteq;
+ }

 test(RRC_db1, RRC_db2)
Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
level sets of factors are different


But then I got this error.

I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR.

Can you please help me?

Thanks in advance.

Regards,

Priya

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing 2 dataframes

2006-11-06 Thread Christoph Buser
Hi

Maybe this example can help you to find your solution:

dat1 - data.frame(CUSTOMER_ID = c(1000786BR, 1002047BR, 10127BR,
 1004166834BR, 1004310897BR, 1006180BR,
 10064798BR, 1007311BR, 1007621BR,
 1008195BR, 10126BR, 95323994BR),
   CUSTOMER_RR = c(5+, 4, 5+, 2, X, 4, 4, 5+,
 4, 4-, 5+, 4))

dat2 - data.frame(CUSTOMER_ID = c(1200786BR, 1802047BR, 1027BR,
 10166834BR, 107BR, 100BR, 164798BR, 1008195BR,
 10126BR),
   CUSTOMER_RR = c(6+, 4, 1+, 2, X, 4, 4, 4,
 5+))

## Merge, but only by CUSTOMER_ID
datM - merge(dat1, dat2, by = CUSTOMER_ID)
datM
## Select only cases that have a similar CUSTOMER_RR
datM1 - datM[as.character(datM[, CUSTOMER_RR.x]) %in%
  as.character(datM[,CUSTOMER_RR.y]), ]
datM1

Regards,

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser [EMAIL PROTECTED]
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Priya Kanhai writes:
  Hi,
  
  I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
  different length.
  
  For example:
  
  RRC_db1:
  
  CUSTOMER_ID CUSTOMER_RR
  1 1000786BR   5+
  2 1002047BR4
  3   10127BR   5+
  4  1004166834BR2
  5  1004310897BRX
  6 1006180BR4
  710064798BR4
  8 1007311BR   5+
  9 1007621BR4
  101008195BR   4-
  11  10126BR   5+
  12   95323994BR4
  
   RRC_db2:
  
  CUSTOMER_ID CUSTOMER_RR
  1 1200786BR   6+
  2 1802047BR4
  3  1027BR 1+
  4   10166834BR2
  5   107BR  X
  6 100BR4
  7164798BR4
  81008195BR   4-
  9  10126BR   5+
  
  
  I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
  third - merge(RRC_db1, RRC_db2) or  third -subset(RRC_db1, CUSTOMER_ID%in%
  RRC_db2$CUSTOMER_ID)
  
  But I also want to check if the CUSTOMER_RR is correct. I had tried this:
  
   test - function(RRC_db1,RRC_db2)
  + {
  + noteq - c()
  + for( i in 1:length(RRC_db1$CUSTOMER_ID)){
  + for( j in 1:length(RRC_db2$CUSTOMER_ID)){
  + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
  + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
  + noteq - c(noteq,RRC_db1$CUSTOMER_ID[i]);
  + }
  + }
  + }
  + }
  + noteq;
  + }
  
   test(RRC_db1, RRC_db2)
  Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
  level sets of factors are different
  
  
  But then I got this error.
  
  I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR.
  
  Can you please help me?
  
  Thanks in advance.
  
  Regards,
  
  Priya
  
   [[alternative HTML version deleted]]
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.