Re: [R] comparing 2 dataframes

Priya Kanhai Tue, 07 Nov 2006 03:33:44 -0800

Hi

The problem is I'm first connecting to the Access database with
odbcConnectAccess and then select with a sqlQuery the dataframe.
In your solution you are typing it. But mine databases consist of
approximately 60000 records.


Maybe you have another solution? Thanks in advance.

Regards,

Priya

On 11/7/06, Christoph Buser <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> Maybe this example can help you to find your solution:
>
> dat1 <- data.frame(CUSTOMER_ID = c("1000786BR", "1002047BR", "10127BR",
>                      "1004166834BR"," 1004310897BR", "1006180BR",
>                      "10064798BR", "1007311BR", "1007621BR",
>                      "1008195BR", "10126BR", "95323994BR"),
>                    CUSTOMER_RR = c("5+", "4", "5+", "2", "X", "4", "4",
> "5+",
>                      "4", "4-", "5+", "4"))
>
> dat2 <- data.frame(CUSTOMER_ID = c("1200786BR", "1802047BR", "1027BR",
>                      "10166834BR", "107BR", "100BR", "164798BR",
> "1008195BR",
>                      "10126BR"),
>                    CUSTOMER_RR = c("6+", "4", "1+", "2", "X", "4", "4",
> "4",
>                      "5+"))
>
> ## Merge, but only by "CUSTOMER_ID"
> datM <- merge(dat1, dat2, by = "CUSTOMER_ID")
> datM
> ## Select only cases that have a similar "CUSTOMER_RR"
> datM1 <- datM[as.character(datM[, "CUSTOMER_RR.x"]) %in%
>               as.character(datM[,"CUSTOMER_RR.y"]), ]
> datM1
>
> Regards,
>
> Christoph
>
> --------------------------------------------------------------
>
> Credit and Surety PML study: visit our web page www.cs-pml.org
>
> --------------------------------------------------------------
> Christoph Buser <[EMAIL PROTECTED]>
> Seminar fuer Statistik, LEO C13
> ETH Zurich      8092 Zurich      SWITZERLAND
> phone: x-41-44-632-4673         fax: 632-1228
> http://stat.ethz.ch/~buser/
> --------------------------------------------------------------
>
>
>
> Priya Kanhai writes:
> > Hi,
> >
> > I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
> > different length.
> >
> > For example:
> >
> > RRC_db1:
> >
> >     CUSTOMER_ID CUSTOMER_RR
> > 1     1000786BR                   5+
> > 2     1002047BR                    4
> > 3       10127BR                   5+
> > 4  1004166834BR                    2
> > 5  1004310897BR                    X
> > 6     1006180BR                    4
> > 7    10064798BR                    4
> > 8     1007311BR                   5+
> > 9     1007621BR                    4
> > 10    1008195BR                   4-
> > 11      10126BR                   5+
> > 12   95323994BR                    4
> >
> >  RRC_db2:
> >
> >     CUSTOMER_ID CUSTOMER_RR
> > 1     1200786BR                   6+
> > 2     1802047BR                    4
> > 3      1027BR                     1+
> > 4   10166834BR                    2
> > 5   107BR                          X
> > 6     100BR                        4
> > 7    164798BR                    4
> > 8    1008195BR                   4-
> > 9      10126BR                   5+
> >
> >
> > I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
> > third <- merge(RRC_db1, RRC_db2) or  third <-subset(RRC_db1,
> CUSTOMER_ID%in%
> > RRC_db2$CUSTOMER_ID)
> >
> > But I also want to check if the CUSTOMER_RR is correct. I had tried
> this:
> >
> > > test <- function(RRC_db1,RRC_db2)
> > + {
> > + noteq <- c()
> > + for( i in 1:length(RRC_db1$CUSTOMER_ID)){
> > + for( j in 1:length(RRC_db2$CUSTOMER_ID)){
> > + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
> > + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
> > + noteq <- c(noteq,RRC_db1$CUSTOMER_ID[i]);
> > + }
> > + }
> > + }
> > + }
> > + noteq;
> > + }
> > >
> > > test(RRC_db1, RRC_db2)
> > Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
> >         level sets of factors are different
> >
> >
> > But then I got this error.
> >
> > I don't only want the CUSTOMER_ID to be the same but also the
> CUSTOMER_RR.
> >
> > Can you please help me?
> >
> > Thanks in advance.
> >
> > Regards,
> >
> > Priya
> >
> >      [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comparing 2 dataframes

Reply via email to