Hi
The problem is I'm first connecting to the Access database with
odbcConnectAccess and then select with a sqlQuery the dataframe.
In your solution you are typing it. But mine databases consist of
approximately 60000 records.
Maybe you have another solution? Thanks in advance.
Regards,
Priya
On 11/7/06, Christoph Buser <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> Maybe this example can help you to find your solution:
>
> dat1 <- data.frame(CUSTOMER_ID = c("1000786BR", "1002047BR", "10127BR",
> "1004166834BR"," 1004310897BR", "1006180BR",
> "10064798BR", "1007311BR", "1007621BR",
> "1008195BR", "10126BR", "95323994BR"),
> CUSTOMER_RR = c("5+", "4", "5+", "2", "X", "4", "4",
> "5+",
> "4", "4-", "5+", "4"))
>
> dat2 <- data.frame(CUSTOMER_ID = c("1200786BR", "1802047BR", "1027BR",
> "10166834BR", "107BR", "100BR", "164798BR",
> "1008195BR",
> "10126BR"),
> CUSTOMER_RR = c("6+", "4", "1+", "2", "X", "4", "4",
> "4",
> "5+"))
>
> ## Merge, but only by "CUSTOMER_ID"
> datM <- merge(dat1, dat2, by = "CUSTOMER_ID")
> datM
> ## Select only cases that have a similar "CUSTOMER_RR"
> datM1 <- datM[as.character(datM[, "CUSTOMER_RR.x"]) %in%
> as.character(datM[,"CUSTOMER_RR.y"]), ]
> datM1
>
> Regards,
>
> Christoph
>
> --------------------------------------------------------------
>
> Credit and Surety PML study: visit our web page www.cs-pml.org
>
> --------------------------------------------------------------
> Christoph Buser <[EMAIL PROTECTED]>
> Seminar fuer Statistik, LEO C13
> ETH Zurich 8092 Zurich SWITZERLAND
> phone: x-41-44-632-4673 fax: 632-1228
> http://stat.ethz.ch/~buser/
> --------------------------------------------------------------
>
>
>
> Priya Kanhai writes:
> > Hi,
> >
> > I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
> > different length.
> >
> > For example:
> >
> > RRC_db1:
> >
> > CUSTOMER_ID CUSTOMER_RR
> > 1 1000786BR 5+
> > 2 1002047BR 4
> > 3 10127BR 5+
> > 4 1004166834BR 2
> > 5 1004310897BR X
> > 6 1006180BR 4
> > 7 10064798BR 4
> > 8 1007311BR 5+
> > 9 1007621BR 4
> > 10 1008195BR 4-
> > 11 10126BR 5+
> > 12 95323994BR 4
> >
> > RRC_db2:
> >
> > CUSTOMER_ID CUSTOMER_RR
> > 1 1200786BR 6+
> > 2 1802047BR 4
> > 3 1027BR 1+
> > 4 10166834BR 2
> > 5 107BR X
> > 6 100BR 4
> > 7 164798BR 4
> > 8 1008195BR 4-
> > 9 10126BR 5+
> >
> >
> > I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
> > third <- merge(RRC_db1, RRC_db2) or third <-subset(RRC_db1,
> CUSTOMER_ID%in%
> > RRC_db2$CUSTOMER_ID)
> >
> > But I also want to check if the CUSTOMER_RR is correct. I had tried
> this:
> >
> > > test <- function(RRC_db1,RRC_db2)
> > + {
> > + noteq <- c()
> > + for( i in 1:length(RRC_db1$CUSTOMER_ID)){
> > + for( j in 1:length(RRC_db2$CUSTOMER_ID)){
> > + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
> > + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
> > + noteq <- c(noteq,RRC_db1$CUSTOMER_ID[i]);
> > + }
> > + }
> > + }
> > + }
> > + noteq;
> > + }
> > >
> > > test(RRC_db1, RRC_db2)
> > Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
> > level sets of factors are different
> >
> >
> > But then I got this error.
> >
> > I don't only want the CUSTOMER_ID to be the same but also the
> CUSTOMER_RR.
> >
> > Can you please help me?
> >
> > Thanks in advance.
> >
> > Regards,
> >
> > Priya
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.