Re: [R] Grouping by Date and showing count of failures by date
There's a package called "pivottabler" which exports PivotTable: http://pivottabler.org.uk/reference/PivotTable.html . Duncan Murdoch On 30/09/2023 7:11 a.m., John Kane wrote: To follow up on Rui Barradas's post, I do not think PivotTable is an R command. You may be thinking og the "pivot_longer" and "pivot_wider" functions in the {tidyr} package which is part of {tidyverse}. On Sat, 30 Sept 2023 at 07:03, Rui Barradas wrote: Às 21:29 de 29/09/2023, Paul Bernal escreveu: Dear friends, Hope you are doing great. I am attaching the dataset I am working with because, when I tried to dput() it, I was not able to copy the entire result from dput(), so I apologize in advance for that. I am interested in creating a column named Failure_Date_Period that has the FAILDATE but formatted as _MM. Then I want to count the number of failures (given by column WONUM) and just have a dataframe that has the FAILDATE and the count of WONUM. I tried this: pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt$defineCalculation(calculationName = "FailCounts", summariseExpression="n()") pt$renderPivot() but I was not successful. Bottom line, I need to create a new dataframe that has the number of failures by FAILDATE, but in -MM format. Any help and/or guidance will be greatly appreciated. Kind regards, Paul __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, No data is attached. Maybe try dput(head(failuredf, 30)) ? And where can we find non-base PivotTable? Please start the scripts with calls to library() when using non-base functionality. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by Date and showing count of failures by date
In this sort of post it would help if we knew the package that was being used for the example. I found one option. https://cran.r-project.org/web/packages/pivottabler/vignettes/v00-vignettes.html There may be a way to create a custom data type that would be a date but restricted to a -mm format. I do not know how to do this. Could you work with the date as a string with a -mm format. The issue is that R will not handle the string as a date. A third option would be to look at the lubridate package that can be installed by itself or as part of tidyverse. I do not promise that this is a solution, but it could be. -Original Message- From: R-help On Behalf Of John Kane Sent: Saturday, September 30, 2023 7:11 AM To: Rui Barradas Cc: Paul Bernal ; R Subject: Re: [R] Grouping by Date and showing count of failures by date [External Email] To follow up on Rui Barradas's post, I do not think PivotTable is an R command. You may be thinking og the "pivot_longer" and "pivot_wider" functions in the {tidyr} package which is part of {tidyverse}. On Sat, 30 Sept 2023 at 07:03, Rui Barradas wrote: > Às 21:29 de 29/09/2023, Paul Bernal escreveu: > > Dear friends, > > > > Hope you are doing great. I am attaching the dataset I am working > > with because, when I tried to dput() it, I was not able to copy the > > entire result from dput(), so I apologize in advance for that. > > > > I am interested in creating a column named Failure_Date_Period that > > has > the > > FAILDATE but formatted as _MM. Then I want to count the number > > of failures (given by column WONUM) and just have a dataframe that > > has the FAILDATE and the count of WONUM. > > > > I tried this: > > pt <- PivotTable$new() > > pt$addData(failuredf) > > pt$addColumnDataGroups("FAILDATE") > > pt <- PivotTable$new() > > pt$addData(failuredf) > > pt$addColumnDataGroups("FAILDATE") > > pt$defineCalculation(calculationName = "FailCounts", > > summariseExpression="n()") > > pt$renderPivot() > > > > but I was not successful. Bottom line, I need to create a new > > dataframe that has the number of failures by FAILDATE, but in -MM > > format. > > > > Any help and/or guidance will be greatly appreciated. > > > > Kind regards, > > Paul > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://st/ > > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl > > .edu%7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a6233 > > 1e1b84%7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi > > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C > > %7C%7C=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D > > ed=0 > > PLEASE do read the posting guide > http://www.r/ > -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C76 > 47cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0% > 7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=I8 > QM8BE7EdpXzfFuwe93IqqL4JS7wWGgfr24XRH5LHs%3D=0 > > and provide commented, minimal, self-contained, reproducible code. > Hello, > > No data is attached. Maybe try > > dput(head(failuredf, 30)) > > ? > > And where can we find non-base PivotTable? Please start the scripts > with calls to library() when using non-base functionality. > > Hope this helps, > > Rui Barradas > > > -- > Este e-mail foi analisado pelo software antivírus AVG para verificar a > presença de vírus. > http://www.a/ > vg.com%2F=05%7C01%7Ctebert%40ufl.edu%7C7647cf60560f40177c9908dbc1 > a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638316691975258863 > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I > k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=zBgPdOLZNvCOa4n2GXWWjLh4wg > 3L4TdGXBMaGJ6n%2BsI%3D=0 > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat/ > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu > %7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84 > %7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > ta=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D=0 > PLEASE do read the posting guide > http://www.r/ > -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C76 > 47cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C
Re: [R] Grouping by Date and showing count of failures by date
To follow up on Rui Barradas's post, I do not think PivotTable is an R command. You may be thinking og the "pivot_longer" and "pivot_wider" functions in the {tidyr} package which is part of {tidyverse}. On Sat, 30 Sept 2023 at 07:03, Rui Barradas wrote: > Às 21:29 de 29/09/2023, Paul Bernal escreveu: > > Dear friends, > > > > Hope you are doing great. I am attaching the dataset I am working with > > because, when I tried to dput() it, I was not able to copy the entire > > result from dput(), so I apologize in advance for that. > > > > I am interested in creating a column named Failure_Date_Period that has > the > > FAILDATE but formatted as _MM. Then I want to count the number of > > failures (given by column WONUM) and just have a dataframe that has the > > FAILDATE and the count of WONUM. > > > > I tried this: > > pt <- PivotTable$new() > > pt$addData(failuredf) > > pt$addColumnDataGroups("FAILDATE") > > pt <- PivotTable$new() > > pt$addData(failuredf) > > pt$addColumnDataGroups("FAILDATE") > > pt$defineCalculation(calculationName = "FailCounts", > > summariseExpression="n()") > > pt$renderPivot() > > > > but I was not successful. Bottom line, I need to create a new dataframe > > that has the number of failures by FAILDATE, but in -MM format. > > > > Any help and/or guidance will be greatly appreciated. > > > > Kind regards, > > Paul > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > Hello, > > No data is attached. Maybe try > > dput(head(failuredf, 30)) > > ? > > And where can we find non-base PivotTable? Please start the scripts with > calls to library() when using non-base functionality. > > Hope this helps, > > Rui Barradas > > > -- > Este e-mail foi analisado pelo software antivírus AVG para verificar a > presença de vírus. > www.avg.com > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- John Kane Kingston ON Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by Date and showing count of failures by date
Às 21:29 de 29/09/2023, Paul Bernal escreveu: Dear friends, Hope you are doing great. I am attaching the dataset I am working with because, when I tried to dput() it, I was not able to copy the entire result from dput(), so I apologize in advance for that. I am interested in creating a column named Failure_Date_Period that has the FAILDATE but formatted as _MM. Then I want to count the number of failures (given by column WONUM) and just have a dataframe that has the FAILDATE and the count of WONUM. I tried this: pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt$defineCalculation(calculationName = "FailCounts", summariseExpression="n()") pt$renderPivot() but I was not successful. Bottom line, I need to create a new dataframe that has the number of failures by FAILDATE, but in -MM format. Any help and/or guidance will be greatly appreciated. Kind regards, Paul __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, No data is attached. Maybe try dput(head(failuredf, 30)) ? And where can we find non-base PivotTable? Please start the scripts with calls to library() when using non-base functionality. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping by Date and showing count of failures by date
Dear friends, Hope you are doing great. I am attaching the dataset I am working with because, when I tried to dput() it, I was not able to copy the entire result from dput(), so I apologize in advance for that. I am interested in creating a column named Failure_Date_Period that has the FAILDATE but formatted as _MM. Then I want to count the number of failures (given by column WONUM) and just have a dataframe that has the FAILDATE and the count of WONUM. I tried this: pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt$defineCalculation(calculationName = "FailCounts", summariseExpression="n()") pt$renderPivot() but I was not successful. Bottom line, I need to create a new dataframe that has the number of failures by FAILDATE, but in -MM format. Any help and/or guidance will be greatly appreciated. Kind regards, Paul __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping Question
Here's a very "step by step" example with dplyr as I'm trying to teach myself the Tidyverse way of being library(dplyr) # SerialMeasurementMeas_testSerial_test # 117failfail # 116passfail # 212passpass # 28passpass # 210passpass # 319failfail # 313passpass dat <- as.data.frame(list(Serial = c(1,1,2,2,2,3,3), Measurement = c(17, 16, 12, 8, 10, 19, 13), Meas_test = c("fail", "pass", "pass", "pass", "pass", "fail", "pass"))) dat %>% group_by(Serial) %>% summarise(Serial_test = sum(Meas_test == "fail")) %>% mutate(Serial_test = if_else(Serial_test > 0, 1, 0), Serial_test = factor(Serial_test, levels = 0:1, labels = c("pass", "fail"))) -> groupedDat dat %>% left_join(groupedDat) # add -> dat to the end to pip to dat Gives: Serial Measurement Meas_test Serial_test 1 1 17 failfail 2 1 16 passfail 3 2 12 passpass 4 2 8 passpass 5 2 10 passpass 6 3 19 failfail 7 3 13 passfail Would be easier for us if used dput() to share your data but thanks for the minimal example! Chris - Original Message - > From: "Ivan Krylov" > To: "Thomas Subia via R-help" > Cc: "Thomas Subia" > Sent: Sunday, 22 March, 2020 07:24:15 > Subject: Re: [R] Grouping Question > On Sat, 21 Mar 2020 20:01:30 -0700 > Thomas Subia via R-help wrote: > >> Serial_test is a pass, when all of the Meas_test are pass for a given >> serial. Else Serial_test is a fail. > > Use by/tapply in base R or dplyr::group_by if you prefer tidyverse > packages. > > -- > Best regards, > Ivan > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Chris Evans Visiting Professor, University of Sheffield I do some consultation work for the University of Roehampton and other places but remains my main Email address. I have a work web site at: https://www.psyctc.org/psyctc/ and a site I manage for CORE and CORE system trust at: http://www.coresystemtrust.org.uk/ I have "semigrated" to France, see: https://www.psyctc.org/pelerinage2016/semigrating-to-france/ That page will also take you to my blog which started with earlier joys in France and Spain! If you want to book to talk, I am trying to keep that to Thursdays and my diary is at: https://www.psyctc.org/pelerinage2016/ceworkdiary/ Beware: French time, generally an hour ahead of UK. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping Question
On Sat, 21 Mar 2020 20:01:30 -0700 Thomas Subia via R-help wrote: > Serial_test is a pass, when all of the Meas_test are pass for a given > serial. Else Serial_test is a fail. Use by/tapply in base R or dplyr::group_by if you prefer tidyverse packages. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping Question
Colleagues, Here is my dataset. Serial Measurement Meas_test Serial_test 1 17 failfail 1 16 passfail 2 12 passpass 2 8 passpass 2 10 passpass 3 19 failfail 3 13 passpass If a measurement is less than or equal to 16, then Meas_test is pass. Else Meas_test is fail This is easy to code. Serial_test is a pass, when all of the Meas_test are pass for a given serial. Else Serial_test is a fail. I'm at a loss to figure out how to do this in R. Some guidance would be appreciated. All the best, Thomas Subia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by 3 variable and renaming groups
Rui Your first code worked just fine. Jeff -Original Message- From: Rui Barradas <ruipbarra...@sapo.pt> Sent: Saturday, May 26, 2018 8:30 AM To: reichm...@sbcglobal.net; 'R-help' <r-help@r-project.org> Subject: Re: [R] Grouping by 3 variable and renaming groups Hello, Sorry, but I think my first answer is wrong. You probably want something along the lines of sp <- split(priceStore_Grps, priceStore_Grps$StorePC) res <- lapply(seq_along(sp), function(i){ sp[[i]]$StoreID <- paste("Store", i, sep = "_") sp[[i]] }) res <- do.call(rbind, res) row.names(res) <- NULL Hope this helps, Rui Barradas On 5/26/2018 2:22 PM, Rui Barradas wrote: > Hello, > > See if this is it: > > priceStore_Grps$StoreID <- paste("Store", > seq_len(nrow(priceStore_Grps)), sep = "_") > > > Hope this helps, > > Rui Barradas > > On 5/26/2018 2:03 PM, Jeff Reichman wrote: >> ALCON >> >> >> I'm trying to figure out how to rename groups in a data frame after >> groups by selected variabels. I am using the dplyr library to group >> my data by 3 variables as follows >> >> >> # group by lat (StoreX)/long (StoreY) >> >> priceStore <- LapTopSales[,c(4,5,15,16)] >> >> priceStore <- priceStore[complete.cases(priceStore), ] # keep only >> non NA records >> >> priceStore_Grps <- priceStore %>% >> >>group_by(StorePC, StoreX, StoreY) %>% >> >>summarize(meanPrice=(mean(RetailPrice))) >> >> >> which results in . >> >> >>> priceStore_Grps >> >> # A tibble: 15 x 4 >> >> # Groups: StorePC, StoreX [?] >> >> StorePC StoreX StoreY meanPrice >> >> >> >> 1 CR7 8LE 532714 168302 472. >> >> 2 E2 0RY 535652 182961 520. >> >> 3 E7 8NW 541428 184515 467. >> >> 4 KT2 5AU 517917 170243 522. >> >> 5 N17 6QA 533788 189994 523. >> >> >> Which is fine, but I then want to give each group (e.g. CR7 8LE >> 532714 >> 168302) a unique identifier (say) Store 1, 2, 3 or some other unique >> identifier. >> >> >> StorePC StoreX StoreY meanPrice >> >> >> >> 1 CR7 8LE 532714 168302 472. Store 1 >> >> 2 E2 0RY 535652 182961 520. Store 2 >> >> 3 E7 8NW 541428 184515 467. Store 3 >> >> 4 KT2 5AU 517917 170243 522. Store 4 >> >> 5 N17 6QA 533788 189994 523. Store 5 >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by 3 variable and renaming groups
Rui That did it Jeff -Original Message- From: Rui Barradas <ruipbarra...@sapo.pt> Sent: Saturday, May 26, 2018 8:23 AM To: reichm...@sbcglobal.net; 'R-help' <r-help@r-project.org> Subject: Re: [R] Grouping by 3 variable and renaming groups Hello, See if this is it: priceStore_Grps$StoreID <- paste("Store", seq_len(nrow(priceStore_Grps)), sep = "_") Hope this helps, Rui Barradas On 5/26/2018 2:03 PM, Jeff Reichman wrote: > ALCON > > > > I'm trying to figure out how to rename groups in a data frame after groups > by selected variabels. I am using the dplyr library to group my data by 3 > variables as follows > > > > # group by lat (StoreX)/long (StoreY) > > priceStore <- LapTopSales[,c(4,5,15,16)] > > priceStore <- priceStore[complete.cases(priceStore), ] # keep only non NA > records > > priceStore_Grps <- priceStore %>% > >group_by(StorePC, StoreX, StoreY) %>% > >summarize(meanPrice=(mean(RetailPrice))) > > > > which results in . > > > >> priceStore_Grps > > # A tibble: 15 x 4 > > # Groups: StorePC, StoreX [?] > > StorePC StoreX StoreY meanPrice > > > > 1 CR7 8LE 532714 168302 472. > > 2 E2 0RY 535652 182961 520. > > 3 E7 8NW 541428 184515 467. > > 4 KT2 5AU 517917 170243 522. > > 5 N17 6QA 533788 189994 523. > > > > Which is fine, but I then want to give each group (e.g. CR7 8LE 532714 > 168302) a unique identifier (say) Store 1, 2, 3 or some other unique > identifier. > > > > StorePC StoreX StoreY meanPrice > > > > 1 CR7 8LE 532714 168302 472. Store 1 > > 2 E2 0RY 535652 182961 520. Store 2 > > 3 E7 8NW 541428 184515 467. Store 3 > > 4 KT2 5AU 517917 170243 522. Store 4 > > 5 N17 6QA 533788 189994 523. Store 5 > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by 3 variable and renaming groups
Hello, Sorry, but I think my first answer is wrong. You probably want something along the lines of sp <- split(priceStore_Grps, priceStore_Grps$StorePC) res <- lapply(seq_along(sp), function(i){ sp[[i]]$StoreID <- paste("Store", i, sep = "_") sp[[i]] }) res <- do.call(rbind, res) row.names(res) <- NULL Hope this helps, Rui Barradas On 5/26/2018 2:22 PM, Rui Barradas wrote: Hello, See if this is it: priceStore_Grps$StoreID <- paste("Store", seq_len(nrow(priceStore_Grps)), sep = "_") Hope this helps, Rui Barradas On 5/26/2018 2:03 PM, Jeff Reichman wrote: ALCON I'm trying to figure out how to rename groups in a data frame after groups by selected variabels. I am using the dplyr library to group my data by 3 variables as follows # group by lat (StoreX)/long (StoreY) priceStore <- LapTopSales[,c(4,5,15,16)] priceStore <- priceStore[complete.cases(priceStore), ] # keep only non NA records priceStore_Grps <- priceStore %>% group_by(StorePC, StoreX, StoreY) %>% summarize(meanPrice=(mean(RetailPrice))) which results in . priceStore_Grps # A tibble: 15 x 4 # Groups: StorePC, StoreX [?] StorePC StoreX StoreY meanPrice 1 CR7 8LE 532714 168302 472. 2 E2 0RY 535652 182961 520. 3 E7 8NW 541428 184515 467. 4 KT2 5AU 517917 170243 522. 5 N17 6QA 533788 189994 523. Which is fine, but I then want to give each group (e.g. CR7 8LE 532714 168302) a unique identifier (say) Store 1, 2, 3 or some other unique identifier. StorePC StoreX StoreY meanPrice 1 CR7 8LE 532714 168302 472. Store 1 2 E2 0RY 535652 182961 520. Store 2 3 E7 8NW 541428 184515 467. Store 3 4 KT2 5AU 517917 170243 522. Store 4 5 N17 6QA 533788 189994 523. Store 5 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by 3 variable and renaming groups
Hello, See if this is it: priceStore_Grps$StoreID <- paste("Store", seq_len(nrow(priceStore_Grps)), sep = "_") Hope this helps, Rui Barradas On 5/26/2018 2:03 PM, Jeff Reichman wrote: ALCON I'm trying to figure out how to rename groups in a data frame after groups by selected variabels. I am using the dplyr library to group my data by 3 variables as follows # group by lat (StoreX)/long (StoreY) priceStore <- LapTopSales[,c(4,5,15,16)] priceStore <- priceStore[complete.cases(priceStore), ] # keep only non NA records priceStore_Grps <- priceStore %>% group_by(StorePC, StoreX, StoreY) %>% summarize(meanPrice=(mean(RetailPrice))) which results in . priceStore_Grps # A tibble: 15 x 4 # Groups: StorePC, StoreX [?] StorePC StoreX StoreY meanPrice 1 CR7 8LE 532714 168302 472. 2 E2 0RY 535652 182961 520. 3 E7 8NW 541428 184515 467. 4 KT2 5AU 517917 170243 522. 5 N17 6QA 533788 189994 523. Which is fine, but I then want to give each group (e.g. CR7 8LE 532714 168302) a unique identifier (say) Store 1, 2, 3 or some other unique identifier. StorePC StoreX StoreY meanPrice 1 CR7 8LE 532714 168302 472. Store 1 2 E2 0RY 535652 182961 520. Store 2 3 E7 8NW 541428 184515 467. Store 3 4 KT2 5AU 517917 170243 522. Store 4 5 N17 6QA 533788 189994 523. Store 5 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping by 3 variable and renaming groups
ALCON I'm trying to figure out how to rename groups in a data frame after groups by selected variabels. I am using the dplyr library to group my data by 3 variables as follows # group by lat (StoreX)/long (StoreY) priceStore <- LapTopSales[,c(4,5,15,16)] priceStore <- priceStore[complete.cases(priceStore), ] # keep only non NA records priceStore_Grps <- priceStore %>% group_by(StorePC, StoreX, StoreY) %>% summarize(meanPrice=(mean(RetailPrice))) which results in . > priceStore_Grps # A tibble: 15 x 4 # Groups: StorePC, StoreX [?] StorePC StoreX StoreY meanPrice 1 CR7 8LE 532714 168302 472. 2 E2 0RY 535652 182961 520. 3 E7 8NW 541428 184515 467. 4 KT2 5AU 517917 170243 522. 5 N17 6QA 533788 189994 523. Which is fine, but I then want to give each group (e.g. CR7 8LE 532714 168302) a unique identifier (say) Store 1, 2, 3 or some other unique identifier. StorePC StoreX StoreY meanPrice 1 CR7 8LE 532714 168302 472. Store 1 2 E2 0RY 535652 182961 520. Store 2 3 E7 8NW 541428 184515 467. Store 3 4 KT2 5AU 517917 170243 522. Store 4 5 N17 6QA 533788 189994 523. Store 5 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping in R
Hi We can only guess what you really want. Maybe this. set.seed(111) cust-sample(letters[1:5], 500, replace =T) value-sample(1:1000, 500) month-sample(1:12, 500, replace=T) dat-data.frame(cust, value, month) dat.ag-aggregate(dat$value, list(dat$month, dat$cust), sum) head(dat.ag) Group.1 Group.2x 1 1 a 2444 2 2 a 6234 3 3 a 6082 4 4 a 3691 5 5 a 3044 6 6 a 3534 dput(dat.ag) structure(list(Group.1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), Group.2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c(a, b, c, d, e), class = factor), x = c(2444L, 6234L, 6082L, 3691L, 3044L, 3534L, 7444L, 1819L, 2295L, 4774L, 3659L, 1159L, 6592L, 1272L, 8245L, 2324L, 5189L, 3935L, 2945L, 2386L, 2796L, 2869L, 3142L, 4657L, 4411L, 6223L, 3266L, 3842L, 6056L, 7472L, 3879L, 7135L, 4544L, 4498L, 2703L, 3409L, 2748L, 2288L, 2654L, 4995L, 4626L, 5543L, 2162L, 4681L, 5853L, 6229L, 3001L, 5274L, 3852L, 2635L, 5643L, 2809L, 2988L, 3756L, 5180L, 2997L, 4883L, 4208L, 2669L, 3151L)), .Names = c(Group.1, Group.2, x), row.names = c(NA, -60L), class = data.frame) But maybe something different. Who knows? If you wanted grouping by value use ?cut or ?findInterval Cheers Petr -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Shivi82 Sent: Thursday, June 18, 2015 9:22 AM To: r-help@r-project.org Subject: [R] Grouping in R Hi All, I am working on a data where the total row count is 25+ and have approx. 20 variables. One of the var on which i need to summarize the data is Consignor i.e. seller name. Now the issue here is after deleting all the duplicate names i still have 55000 unique customer name and i am not sure on how to summarize the data. Is there a possibility that i could create 8 or 10 groups based on the weight or booking they made from our company and eventually all 55000 customers would fall under these 10 groups. Then it could be easier for me to analyze in which group there is a variance on a month on month level. -- View this message in context: http://r.789695.n4.nabble.com/Grouping- in-R-tp4708800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings
[R] Grouping in R
Hi All, I am working on a data where the total row count is 25+ and have approx. 20 variables. One of the var on which i need to summarize the data is Consignor i.e. seller name. Now the issue here is after deleting all the duplicate names i still have 55000 unique customer name and i am not sure on how to summarize the data. Is there a possibility that i could create 8 or 10 groups based on the weight or booking they made from our company and eventually all 55000 customers would fall under these 10 groups. Then it could be easier for me to analyze in which group there is a variance on a month on month level. -- View this message in context: http://r.789695.n4.nabble.com/Grouping-in-R-tp4708800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping explanatory variables into sets for GLMM
Dear all, I am trying to run a GLMM following the procedure described by Rhodes et al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in his example, I have four sets of explanatory variables: 1. Land use - 1 variable, factor (forest or agriculture) 2. Location - 1 variable, factor (riparian or upland) 3. Agricultural management - 3 variables that are binary (0 or 1 for till, manure, annual crop) 4. Vegetation patterns - 4 variables that are continuous (# of plant species in 4 different functional guilds) How do I create these sets? I would like to build my model with these sets only instead of listing every variable. Also: is there a way of running all possible models with the different combinations of these sets and/or variables, sort of like running ordistep for ordinations? Thanks a bunch in advance for your help! Maria __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping explanatory variables into sets for GLMM
Have you read An Introduction to R (or other online tutorial)? If not, please do so before posting further here. It sounds like you are missing very basic knowledge -- on factors -- which you need to learn about before proceeding. ?factor gives you the answer you seek, I believe. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker maria.kernec...@mail.mcgill.ca wrote: Dear all, I am trying to run a GLMM following the procedure described by Rhodes et al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in his example, I have four sets of explanatory variables: 1. Land use - 1 variable, factor (forest or agriculture) 2. Location - 1 variable, factor (riparian or upland) 3. Agricultural management - 3 variables that are binary (0 or 1 for till, manure, annual crop) 4. Vegetation patterns - 4 variables that are continuous (# of plant species in 4 different functional guilds) How do I create these sets? I would like to build my model with these sets only instead of listing every variable. Also: is there a way of running all possible models with the different combinations of these sets and/or variables, sort of like running ordistep for ordinations? Thanks a bunch in advance for your help! Maria __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping explanatory variables into sets for GLMM
Unless there is reason to keep the conversation private, always reply to the list. How will anyone else know that my answer wasn't satisfactory? 1. I don't intend to go through your references. A minimal reproducible example of what you wish to do and what you tried would help. 2. Have you read An Intro to R? Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD mkernec...@gmail.com wrote: Thanks for getting back to me. It seems I didn't write my question clearly and that it was misunderstood - even if it is easy to answer: I would like to reduce the number of explanatory variables in my model by using sets or categories that these variables belong to, like Rhodes et al. did in their chapter, or like Lentini et al. 2012 did in their paper. Factor is not the answer I am looking for, unfortunately. On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote: Have you read An Introduction to R (or other online tutorial)? If not, please do so before posting further here. It sounds like you are missing very basic knowledge -- on factors -- which you need to learn about before proceeding. ?factor gives you the answer you seek, I believe. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker maria.kernec...@mail.mcgill.ca wrote: Dear all, I am trying to run a GLMM following the procedure described by Rhodes et al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in his example, I have four sets of explanatory variables: 1. Land use - 1 variable, factor (forest or agriculture) 2. Location - 1 variable, factor (riparian or upland) 3. Agricultural management - 3 variables that are binary (0 or 1 for till, manure, annual crop) 4. Vegetation patterns - 4 variables that are continuous (# of plant species in 4 different functional guilds) How do I create these sets? I would like to build my model with these sets only instead of listing every variable. Also: is there a way of running all possible models with the different combinations of these sets and/or variables, sort of like running ordistep for ordinations? Thanks a bunch in advance for your help! Maria __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping explanatory variables into sets for GLMM
Reading the Intro, as Bert suggests, would likely solve some of your problems. If you think about how many combinations it would take, using only one variable from each group in any one model, you would see that the number of individual models (12) is not so onerous that you couldn’t specify them one at a time. On Apr 3, 2014, at 8:55 PM, Bert Gunter gunter.ber...@gene.com wrote: Unless there is reason to keep the conversation private, always reply to the list. How will anyone else know that my answer wasn't satisfactory? 1. I don't intend to go through your references. A minimal reproducible example of what you wish to do and what you tried would help. 2. Have you read An Intro to R? Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD mkernec...@gmail.com wrote: Thanks for getting back to me. It seems I didn't write my question clearly and that it was misunderstood - even if it is easy to answer: I would like to reduce the number of explanatory variables in my model by using sets or categories that these variables belong to, like Rhodes et al. did in their chapter, or like Lentini et al. 2012 did in their paper. Factor is not the answer I am looking for, unfortunately. On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote: Have you read An Introduction to R (or other online tutorial)? If not, please do so before posting further here. It sounds like you are missing very basic knowledge -- on factors -- which you need to learn about before proceeding. ?factor gives you the answer you seek, I believe. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker maria.kernec...@mail.mcgill.ca wrote: Dear all, I am trying to run a GLMM following the procedure described by Rhodes et al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in his example, I have four sets of explanatory variables: 1. Land use - 1 variable, factor (forest or agriculture) 2. Location - 1 variable, factor (riparian or upland) 3. Agricultural management - 3 variables that are binary (0 or 1 for till, manure, annual crop) 4. Vegetation patterns - 4 variables that are continuous (# of plant species in 4 different functional guilds) How do I create these sets? I would like to build my model with these sets only instead of listing every variable. Also: is there a way of running all possible models with the different combinations of these sets and/or variables, sort of like running ordistep for ordinations? Thanks a bunch in advance for your help! Maria __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Don McKenzie Research Ecologist Pacific WIldland Fire Sciences Lab US Forest Service Affiliate Professor School of Environmental and Forest Sciences College of the Environment University of Washington d...@uw.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping on a Distance Matrix
Hello, I'm looking for a function that groups elements below a certain distance threshold, based on a distance matrix. In other words, I'd like to group samples without using a standard clustering algorithm on the distance matrix. For example, let the distance matrix be : A B C D A 0 0.03 0.77 1.12 B 0.03 0 1.59 1.11 C 0.77 1.59 0 0.09 D 1.12 1.11 0.09 0 Two clusters would be found with a cutoff of 0.1. The first contains A,B. The second has C,D. Is there an efficient function that does this ? I can think of how to do this recursively, but am hoping it's already been considered. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping on a Distance Matrix
You need to re-think. What you said is nonsense. Use an appropriate clustering algorithm. (a can be near b; b can be near c; but a is not near c, using near = closer than threshhold) Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Thu, Feb 13, 2014 at 12:00 AM, Dario Strbenac dstr7...@uni.sydney.edu.au wrote: Hello, I'm looking for a function that groups elements below a certain distance threshold, based on a distance matrix. In other words, I'd like to group samples without using a standard clustering algorithm on the distance matrix. For example, let the distance matrix be : A B C D A 0 0.03 0.77 1.12 B 0.03 0 1.59 1.11 C 0.77 1.59 0 0.09 D 1.12 1.11 0.09 0 Two clusters would be found with a cutoff of 0.1. The first contains A,B. The second has C,D. Is there an efficient function that does this ? I can think of how to do this recursively, but am hoping it's already been considered. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping commands so that variablas are removed automatically - like functions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi I would like to group commands, so that after a group of commands has been executed, the variables defined in that group are automatically deleted. My reasoning: I have a longer sript which is used to load data, do analysis and plot graphs, all part of a document (in org-mode / emacs). I have several datasets which are loaded, and each one is quite big. So after doing one part of the job (e.g. analysing the data and storing the results) I want to delete all variables used to free space and to avoid having these variables being used in the next block and still having the old (for this block invalid) values. I can't use rm(list=ls()) as I have some variables as constants which do not change over the whole document and also some functions defined. I could put each block in a function and then call the function and delete it afterwards, but this is as I see it abusing functions. I don't want to keep track manually of the variables. Therefore my question: Can I do something like: x - 15 { #here begins the block a - 1:100 b - 4:400 } # here ends the block # here are and b not defined anymore # but x is still defined {} is great for grouping the commands, but the variables are not deleted afterwards. Am I missing a language feature in R? Rainer - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS3SDSAAoJENvXNx4PUvmCEAwH/jBCuQLRpRcPu+PSrUBsck8v 49q3f0wAZqhyfjMQvRnLSECAfQN4GHI1WvXcuC9R8Z0eokL7gAqMnJSgWd61Un0F I+yClK1qbhpCwR8WV4nDXTuEW5rb5d8a1iHRPxXXSi/vdJZL3imWMsfvGTpgIhVw Dbi7+BSh52ZFEZPIyTm2+4qBfQA2ZaY3AEPTjBdB4iL603S+lpgmm1mAInFHFx5g 0CzzY3feTWreD+EATXMGofTDaoxR5vuLvIRvv+PA/Ehz/hVnQah2xriL4NR+pIHz 7WbqiReJ8H1ruAgtW6o8CmQRMArHmk0oBy1vYQvwB7SZ8/DOyKkArKBy8tGx/J0= =dBo5 -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping commands so that variablas are removed automatically - like functions
Check out the use of the 'local' function: gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 199420 10.7 407500 21.8 35 18.7 Vcells 308004 2.4 786432 6.0 786424 6.0 result - local({ + a - rnorm(100) # big objects + b - rnorm(100) + mean(a + b) # return value + }) result [1] 0.0001819203 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 199666 10.7 407500 21.8 35 18.7 Vcells 308780 2.42975200 22.7 3710863 28.4 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug rai...@krugs.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi I would like to group commands, so that after a group of commands has been executed, the variables defined in that group are automatically deleted. My reasoning: I have a longer sript which is used to load data, do analysis and plot graphs, all part of a document (in org-mode / emacs). I have several datasets which are loaded, and each one is quite big. So after doing one part of the job (e.g. analysing the data and storing the results) I want to delete all variables used to free space and to avoid having these variables being used in the next block and still having the old (for this block invalid) values. I can't use rm(list=ls()) as I have some variables as constants which do not change over the whole document and also some functions defined. I could put each block in a function and then call the function and delete it afterwards, but this is as I see it abusing functions. I don't want to keep track manually of the variables. Therefore my question: Can I do something like: x - 15 { #here begins the block a - 1:100 b - 4:400 } # here ends the block # here are and b not defined anymore # but x is still defined {} is great for grouping the commands, but the variables are not deleted afterwards. Am I missing a language feature in R? Rainer - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS3SDSAAoJENvXNx4PUvmCEAwH/jBCuQLRpRcPu+PSrUBsck8v 49q3f0wAZqhyfjMQvRnLSECAfQN4GHI1WvXcuC9R8Z0eokL7gAqMnJSgWd61Un0F I+yClK1qbhpCwR8WV4nDXTuEW5rb5d8a1iHRPxXXSi/vdJZL3imWMsfvGTpgIhVw Dbi7+BSh52ZFEZPIyTm2+4qBfQA2ZaY3AEPTjBdB4iL603S+lpgmm1mAInFHFx5g 0CzzY3feTWreD+EATXMGofTDaoxR5vuLvIRvv+PA/Ehz/hVnQah2xriL4NR+pIHz 7WbqiReJ8H1ruAgtW6o8CmQRMArHmk0oBy1vYQvwB7SZ8/DOyKkArKBy8tGx/J0= =dBo5 -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping commands so that variablas are removed automatically - like functions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/20/14, 14:27 , jim holtman wrote: Check out the use of the 'local' function: True - have completely forgotten the local function. Thanks, Rainer gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 199420 10.7 407500 21.8 35 18.7 Vcells 308004 2.4 786432 6.0 786424 6.0 result - local({ + a - rnorm(100) # big objects + b - rnorm(100) + mean(a + b) # return value + }) result [1] 0.0001819203 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 199666 10.7 407500 21.8 35 18.7 Vcells 308780 2.42975200 22.7 3710863 28.4 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug rai...@krugs.de wrote: Hi I would like to group commands, so that after a group of commands has been executed, the variables defined in that group are automatically deleted. My reasoning: I have a longer sript which is used to load data, do analysis and plot graphs, all part of a document (in org-mode / emacs). I have several datasets which are loaded, and each one is quite big. So after doing one part of the job (e.g. analysing the data and storing the results) I want to delete all variables used to free space and to avoid having these variables being used in the next block and still having the old (for this block invalid) values. I can't use rm(list=ls()) as I have some variables as constants which do not change over the whole document and also some functions defined. I could put each block in a function and then call the function and delete it afterwards, but this is as I see it abusing functions. I don't want to keep track manually of the variables. Therefore my question: Can I do something like: x - 15 { #here begins the block a - 1:100 b - 4:400 } # here ends the block # here are and b not defined anymore # but x is still defined {} is great for grouping the commands, but the variables are not deleted afterwards. Am I missing a language feature in R? Rainer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS3Sn/AAoJENvXNx4PUvmC7uIIAIkXdCNVCA1sqJ7jqODTWbG9 OrDTkhRD/IyR//39sCj5YC79peLbPkpKtQgnmoj7jMoNg2euxmCn3wIGLigWhy2w cyGqh/TocfRnYVKyQXz4LC/IqVFAi+W9ymyevVDA0vQ9RcEYILEsXxjxl06VhZhS wzOHOiXXdHka8xswjChPJRjA/17LQaStLYeEIQbukz3WCj1wTY68b6YixqlSh/BZ 7C91EULBQtTqV5OetvfV9lulicw0XyWp+ZcNvEa72Y3jZw5DX0LloLcRuuGLZf3N dxmnB7Uj4kBArjupgfGtkwZzT1d3UX0bb3vqPt0TRoeJCT04XnupbpdpwUOhJ8c= =0zID -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping Matrix by Columns; OHLC Data
HI, May be this helps: set.seed(24) mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24) colnames(mat1)- rep(c(O,H,L,C),6) indx-seq_along(colnames(mat1)) n- length(unique(colnames(mat1))) res- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i]) lapply(res,head,2) #$`1` # O O O O O O #[1,] 18 56 51 24 24 52 #[2,] 14 31 60 12 43 34 # #$`2` # H H H H H H #[1,] 20 6 4 23 10 2 #[2,] 15 37 22 52 30 42 # #$`3` # L L L L L L #[1,] 30 25 29 1 57 16 #[2,] 15 23 15 10 44 60 # #$`4` # C C C C C C #[1,] 20 13 8 44 5 13 #[2,] 45 17 35 8 25 12 A.K. Motivation: Bring in data containing a number of columns divisable by 4. This data contains several different assets and the columns correspond to Open,High,Low,Close, Open,High,Low,Close, etc (thus divisible by 4). From where I am getting this data, the header is not labled as Open,High,Low,Close, but rather just has the asset symbol. The end goal is to have each Open,High,Low,Close, as its own OHLC object, to be run through different volatility functions (via QuantMod ) I believe i am best served by first grouping the original data so that each asset is its own object, with 4 columns. Then i can rename the columns to be: colnames(function$asset) -c(Open, High,Low, Close) I've attempted to use split, but am having trouble with split along the columns. Obviously I could manipulate the indexing, with something like data[i:i+4] and use a loop. Maybe this indexing approach would work with use of apply(). Previously, I've been using Mathematica for most of my data manipulation, and there I would partition the entire data set i.e. Matrix, into column# / 4 separate objects. So, in that case I have a 3 dimensional object. I'd then call the object by its 3rd dimension index # [][#]. I'm having trouble doing that here. Any thoughts, or at the least helping me to group the data by column. For the sake of possible examples, lets say the dimensions of my data is n.rows = 30, n.col = 24 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping Matrix by Columns; OHLC Data
Hi Jake. Sorry, I misunderstood about what you wanted. Instead of this: lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i]) If I use: res1- lapply(split(indx,(indx-1)%/%n+1),function(i) mat1[,i]) #or lapply(split(indx, as.numeric(gl(ncol(mat1),n,ncol(mat1,function(i) mat1[,i]) lapply(res1,head,2)[1:2] #$`1` # O H L C #[1,] 18 20 30 20 #[2,] 14 15 15 45 # #$`2` # O H L C #[1,] 56 6 25 13 #[2,] 31 37 23 17 A.K. So, i got it worked out. Thanks for your input. I see that you used a mod, which worked well for the application which you solved, and an application that will likely come up again. Anyways, here is the solution I was lookin for: set.seed(24) mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24) colnames(mat1)- rep(c(O,H,L,C),6) indx-seq_along(colnames(mat1)) n- length(unique(colnames(mat1))) res -lapply(split(indx,rep(1:6,each = 4, times = 1)),function(i) mat1[,i]) ##rep(1:6,each = 4, times = 1) ## [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 lapply(res,head,2) $`1` O H L C [1,] 18 20 30 20 [2,] 14 15 15 45 $`2` O H L C [1,] 56 6 25 13 [2,] 31 37 23 17 $`3` O H L C [1,] 51 4 29 8 [2,] 60 22 15 35 $`4` O H L C [1,] 24 23 1 44 [2,] 12 52 10 8 $`5` O H L C [1,] 24 10 57 5 [2,] 43 30 44 25 $`6` O H L C [1,] 52 2 16 13 [2,] 34 42 60 12 Thanks again - Original Message - From: arun smartpink...@yahoo.com To: R help r-help@r-project.org Cc: Sent: Thursday, September 26, 2013 5:15 PM Subject: Re: Grouping Matrix by Columns; OHLC Data HI, May be this helps: set.seed(24) mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24) colnames(mat1)- rep(c(O,H,L,C),6) indx-seq_along(colnames(mat1)) n- length(unique(colnames(mat1))) res- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i]) lapply(res,head,2) #$`1` # O O O O O O #[1,] 18 56 51 24 24 52 #[2,] 14 31 60 12 43 34 # #$`2` # H H H H H H #[1,] 20 6 4 23 10 2 #[2,] 15 37 22 52 30 42 # #$`3` # L L L L L L #[1,] 30 25 29 1 57 16 #[2,] 15 23 15 10 44 60 # #$`4` # C C C C C C #[1,] 20 13 8 44 5 13 #[2,] 45 17 35 8 25 12 A.K. Motivation: Bring in data containing a number of columns divisable by 4. This data contains several different assets and the columns correspond to Open,High,Low,Close, Open,High,Low,Close, etc (thus divisible by 4). From where I am getting this data, the header is not labled as Open,High,Low,Close, but rather just has the asset symbol. The end goal is to have each Open,High,Low,Close, as its own OHLC object, to be run through different volatility functions (via QuantMod ) I believe i am best served by first grouping the original data so that each asset is its own object, with 4 columns. Then i can rename the columns to be: colnames(function$asset) -c(Open, High,Low, Close) I've attempted to use split, but am having trouble with split along the columns. Obviously I could manipulate the indexing, with something like data[i:i+4] and use a loop. Maybe this indexing approach would work with use of apply(). Previously, I've been using Mathematica for most of my data manipulation, and there I would partition the entire data set i.e. Matrix, into column# / 4 separate objects. So, in that case I have a 3 dimensional object. I'd then call the object by its 3rd dimension index # [][#]. I'm having trouble doing that here. Any thoughts, or at the least helping me to group the data by column. For the sake of possible examples, lets say the dimensions of my data is n.rows = 30, n.col = 24 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping variables by a irregular time interval
Hello all, I have a very large data frame (more than 5 million lines) as below (dput example at the end of mail): Station Antenna TagDateTime Power Events 1 2 999 22/07/2013 11:00:2117 1 1 2 999 22/07/2013 11:33:4731 1 1 2 999 22/07/2013 11:34:0019 1 1 2 999 22/07/2013 11:34:1653 1 1 2 999 22/07/2013 11:43:2015 1 1 2 999 22/07/2013 11:43:3517 1 To each Tag, in each Antenna, in each Station, I need to create a 10 min interval and sum the number of Events and mean of Power in the time interval, as below (complete wanted output at the end of mail). Station Antenna Tag StartDateTime EndDateTime Power Events 1 2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117 1 1 2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527 5 1 2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14 2 1 1 25/07/2013 14:19:45 25/07/2013 14:20:3965 4 2 1 2 25/07/2013 14:20:13 25/07/2013 14:25:1421 3 2 1 4 25/07/2013 14:20:46 25/07/2013 14:20:4628 1 Show start and end points of each interval is optional, not necessary. I put both to show the irregular time interval: look to Tag 999: first interval are between 11:00 and 11:10, second between 11:34 and 11:44 and third are between 11:44 and 11:45. First I tried a for-loop, without success. After that, I tried this code: require (plyr) ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)), summarise, Power = round (mean(Power), 0), Events = sum (Events)) Is almost what I want, because cut() divided in regular time intervals, but in some cases I do not have this, and it split a unique observation in two. Any ideas to solve this issue? R version 3.0.1 (2013-05-16) -- Good Sport Platform: x86_64-w64-mingw32/x64 (64-bit) Windows 7 Professional Thanks in advanced, Raoni -- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.ra...@gmail.com ##complete data dput structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L, 21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(3L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 18L, 19L, 19L, 19L, 19L, 20L, 23L, 19L, 17L, 17L, 17L, 23L, 18L, 1L, 1L, 1L, 2L, 2L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 18L, 19L, 21L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 28L), .Label = c(19/06/2013 22:15, 19/06/2013 22:16, 22/07/2013 11:00, 22/07/2013 11:33, 22/07/2013 11:34, 22/07/2013 11:43, 22/07/2013 11:44, 22/07/2013 11:45, 25/07/2013 14:10, 25/07/2013 14:11, 25/07/2013 14:12, 25/07/2013 14:13, 25/07/2013 14:14, 25/07/2013 14:15, 25/07/2013 14:16, 25/07/2013 14:17, 25/07/2013 14:18, 25/07/2013 14:19, 25/07/2013 14:20, 25/07/2013 14:21, 25/07/2013 14:23, 25/07/2013 14:24, 25/07/2013 14:25, 25/07/2013 14:26, 25/07/2013 14:27, 25/07/2013 14:28, 25/07/2013 14:29, 25/07/2013 14:30), class = factor), Power = c(17L, 31L, 19L, 53L, 15L, 17L, 21L, 12L, 15L,
Re: [R] Grouping variables by a irregular time interval
Arun caught my attention that I committed a mistake with example data set. I send now the correct, with same text explain my problem. Sorry all of you for the confusion. I have a very large data frame (more than 5 million lines) as below (dput example at the end of mail): Station Antenna TagDateTime Power Events 1 1 2 999 22/07/2013 11:00:2117 1 2 1 2 999 22/07/2013 11:33:4731 1 3 1 2 999 22/07/2013 11:34:0019 1 4 1 2 999 22/07/2013 11:34:1653 1 5 1 2 999 22/07/2013 11:43:2015 1 6 1 2 999 22/07/2013 11:43:3517 1 To each Tag, in each Antenna, in each Station, I need to create a 10 min interval and sum the number of Events and mean of Power in the time interval, as below (complete wanted output at the end of mail). Station Antenna Tag StartDateTime EndDateTime Power Events 1 1 2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117 1 2 1 2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527 5 3 1 2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14 4 2 1 1 25/07/2013 14:19:45 25/07/2013 14:20:3965 4 5 2 1 2 25/07/2013 14:20:13 25/07/2013 14:25:1421 3 6 2 1 4 25/07/2013 14:20:46 25/07/2013 14:20:4628 1 Show start and end points of each interval is optional, not necessary. I put both to show the irregular time interval (look at tag 999). First I tried a for-loop, without success. After that, I tried this code: require (plyr) ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)), summarise, Power = round (mean(Power), 0), Events = sum (Events)) Is almost what I want, because cut() divided in regular time intervals, but in some cases I do not have this, and it split a unique observation in two. Any ideas to solve this issue? R version 3.0.1 (2013-05-16) -- Good Sport Platform: x86_64-w64-mingw32/x64 (64-bit) Windows 7 Professional Thanks in advanced, Raoni -- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.ra...@gmail.com ##complete data dput structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L, 21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 68L, 70L, 72L, 73L, 71L, 75L, 86L, 74L, 64L, 64L, 65L, 87L, 67L, 1L, 2L, 3L, 4L, 5L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 66L, 69L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L), .Label = c(19/06/2013 22:15:49, 19/06/2013 22:15:54, 19/06/2013 22:15:59, 19/06/2013 22:16:24, 19/06/2013 22:16:29, 22/07/2013 11:00:21, 22/07/2013 11:33:47, 22/07/2013 11:34:00, 22/07/2013 11:34:16, 22/07/2013 11:43:20, 22/07/2013 11:43:35, 22/07/2013 11:44:35, 22/07/2013 11:44:41, 22/07/2013 11:44:42, 22/07/2013 11:44:43, 22/07/2013 11:44:44, 22/07/2013 11:44:59, 22/07/2013 11:45:11, 22/07/2013 11:45:29, 22/07/2013 11:45:30, 22/07/2013 11:45:31, 22/07/2013 11:45:35, 22/07/2013 11:45:37,
[R] Grouping variables by a irregular time interval
Hello all, I´m have a very large data frame (more than 5 million lines) as below (dput example at the end of mail): Station Antenna TagDateTime Power Events 1 1 2 999 22/07/2013 11:00:2117 1 2 1 2 999 22/07/2013 11:33:4731 1 3 1 2 999 22/07/2013 11:34:0019 1 4 1 2 999 22/07/2013 11:34:1653 1 5 1 2 999 22/07/2013 11:43:2015 1 6 1 2 999 22/07/2013 11:43:3517 1 To each Tag, in each Antenna, in each Station, I need to create a 10 min interval and sum the number of Events and mean of Power in the time interval, as below (complete wanted output at the end of mail). Station Antenna Tag StartDateTime EndDateTime Power Events 1 1 2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117 1 2 1 2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527 5 3 1 2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14 4 2 1 1 25/07/2013 14:19:45 25/07/2013 14:20:3965 4 5 2 1 2 25/07/2013 14:20:13 25/07/2013 14:25:1421 3 6 2 1 4 25/07/2013 14:20:46 25/07/2013 14:20:4628 1 Show start and end points of each interval is optional, not necessary. I put both to show the irregular time interval (look at tag 999) First I tried a for-loop, without success. After that, I tried this code: require (plyr) ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)), summarise, Power = round (mean(Power), 0), Events = sum (Events)) Is almost what I want, because cut() divided in regular time intervals, but in some cases I do not have this, and it split a unique observation in two. Any ideas to solve this issue? R version 3.0.1 (2013-05-16) -- Good Sport Platform: x86_64-w64-mingw32/x64 (64-bit) Windows 7 Professional Thanks in advanced, Raoni -- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.ra...@gmail.com ##complete data dput structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L, 21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(3L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 18L, 19L, 19L, 19L, 19L, 20L, 23L, 19L, 17L, 17L, 17L, 23L, 18L, 1L, 1L, 1L, 2L, 2L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 18L, 19L, 21L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 28L), .Label = c(19/06/2013 22:15, 19/06/2013 22:16, 22/07/2013 11:00, 22/07/2013 11:33, 22/07/2013 11:34, 22/07/2013 11:43, 22/07/2013 11:44, 22/07/2013 11:45, 25/07/2013 14:10, 25/07/2013 14:11, 25/07/2013 14:12, 25/07/2013 14:13, 25/07/2013 14:14, 25/07/2013 14:15, 25/07/2013 14:16, 25/07/2013 14:17, 25/07/2013 14:18, 25/07/2013 14:19, 25/07/2013 14:20, 25/07/2013 14:21, 25/07/2013 14:23, 25/07/2013 14:24, 25/07/2013 14:25, 25/07/2013 14:26, 25/07/2013 14:27, 25/07/2013 14:28, 25/07/2013 14:29, 25/07/2013 14:30), class = factor), Power = c(17L, 31L, 19L, 53L, 15L, 17L, 21L, 12L, 15L, 22L, 19L, 15L, 13L, 14L, 15L, 12L, 23L, 19L, 16L, 20L, 30L, 37L, 25L, 167L, 24L, 14L,
Re: [R] grouping followed by finding frequent patterns in R
1.Please cc to the list, as I have here, unless your comments are off topic. 2. Use dput() (?dput) to include **small** amounts of data in your message, as attachments are generally stripped from r-help. 3. I have no experience with itemsets or the arules package, but a quick glance at the docs there said that your data argument must be in a specific form coercible into an S4 transactions class. I suspect that neither your initial data frame nor the list deriving from split is, but maybe someone familiar with the package can tell you for sure. That's why you need to cc to the list. -- Bert On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas crazydh...@gmail.com wrote: Dear Bert, My intention is to mine frequent itemsets of TRN_TYP for individual CIN out of that data. But the problem is using eclat after splitting gives the following error: Error in eclat(list) : internal error in trio library PS: I have attached my dataset. On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter gunter.ber...@gene.com wrote: I **suggest** that you explain what you wish to accomplish using a reproducible example rather than telling us what packages you think you should use. I believe you are making things too complicated; e.g. what do you mean by frequent patterns? Moreover, basket format is rather unclear -- and may well be unnecessary. But using lists, it could be simply accomplished by ?split ## as in the_list - with(yourdata, split(TYP, CIN.TRN)) or possibly the_list - with(yourdata, tapply(TYP,CIN.TRN, FUN = table)) Of course, these may be irrelevant and useless, but without knowing your purpose ...? -- Bert On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas crazydh...@gmail.com wrote: I have a data in the following form : CIN TRN_TYP 90799541 90799542 90799543 90799544 90799545 90799544 90799545 90799546 90799547 90799548 90799549 90799549 .. .. .. there are 100 types of CIN (9079954,12441087,15246633,...) and respective TRN_TYP first of all, I want this data to be grouped into basket format: 9079954 1, 2, 3, 4, 5, 12441087 19, 14, 21, 3, 7, ... . . . and then apply eclat from arules package to find frequent patterns. 1) I ran the following code: file-read.csv(D:/R/Practice/Data_Input_NUM.csv) file - file[!duplicated(file),] eclat(split(file$TRN_TYP,file$CIN)) but it gave me the following error: Error in asMethod(object) : can not coerce list with transactions with duplicated items 2) I ran this code: file-read.csv(D:/R/Practice/Data_Input_NUM.csv) file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other columns as well, so I selecting only CIN and TRN_TYP file_new - file_new[!duplicated(file_new),] eclat(split(file_new$TRN_TYP,file_new$CIN)) but again: Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) : internal error in trio library PLEASE HELP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping followed by finding frequent patterns in R
I have a data in the following form : CIN TRN_TYP 90799541 90799542 90799543 90799544 90799545 90799544 90799545 90799546 90799547 90799548 90799549 90799549 .. .. .. there are 100 types of CIN (9079954,12441087,15246633,...) and respective TRN_TYP first of all, I want this data to be grouped into basket format: 9079954 1, 2, 3, 4, 5, 12441087 19, 14, 21, 3, 7, ... . . . and then apply eclat from arules package to find frequent patterns. 1) I ran the following code: file-read.csv(D:/R/Practice/Data_Input_NUM.csv) file - file[!duplicated(file),] eclat(split(file$TRN_TYP,file$CIN)) but it gave me the following error: Error in asMethod(object) : can not coerce list with transactions with duplicated items 2) I ran this code: file-read.csv(D:/R/Practice/Data_Input_NUM.csv) file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other columns as well, so I selecting only CIN and TRN_TYP file_new - file_new[!duplicated(file_new),] eclat(split(file_new$TRN_TYP,file_new$CIN)) but again: Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) : internal error in trio library PLEASE HELP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping followed by finding frequent patterns in R
I **suggest** that you explain what you wish to accomplish using a reproducible example rather than telling us what packages you think you should use. I believe you are making things too complicated; e.g. what do you mean by frequent patterns? Moreover, basket format is rather unclear -- and may well be unnecessary. But using lists, it could be simply accomplished by ?split ## as in the_list - with(yourdata, split(TYP, CIN.TRN)) or possibly the_list - with(yourdata, tapply(TYP,CIN.TRN, FUN = table)) Of course, these may be irrelevant and useless, but without knowing your purpose ...? -- Bert On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas crazydh...@gmail.com wrote: I have a data in the following form : CIN TRN_TYP 90799541 90799542 90799543 90799544 90799545 90799544 90799545 90799546 90799547 90799548 90799549 90799549 .. .. .. there are 100 types of CIN (9079954,12441087,15246633,...) and respective TRN_TYP first of all, I want this data to be grouped into basket format: 9079954 1, 2, 3, 4, 5, 12441087 19, 14, 21, 3, 7, ... . . . and then apply eclat from arules package to find frequent patterns. 1) I ran the following code: file-read.csv(D:/R/Practice/Data_Input_NUM.csv) file - file[!duplicated(file),] eclat(split(file$TRN_TYP,file$CIN)) but it gave me the following error: Error in asMethod(object) : can not coerce list with transactions with duplicated items 2) I ran this code: file-read.csv(D:/R/Practice/Data_Input_NUM.csv) file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other columns as well, so I selecting only CIN and TRN_TYP file_new - file_new[!duplicated(file_new),] eclat(split(file_new$TRN_TYP,file_new$CIN)) but again: Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) : internal error in trio library PLEASE HELP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping elements of a data frame
Hi everyone, I have a question on selecting and grouping elements of a data frame. For example: A.df- [ a c 0.9 b x 0.8 b z 0.5 c y 0.9 c x 0.7 c z 0.6] I want to create a list of a data frame that gives me the unique values of column 1 of A.df so that i can create intersects. That is: B[a]- [ c 0.9] B[b]- [ x 0.8 z 0.5] B[c]- [ y 0.9 x 0.7 z 0.6] B[c] n B[b] - c(x,z) How can I accomplish this? Thanks, Al __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping elements of a data frame
On Jan 15, 2013, at 9:10 AM, Nuri Alpay Temiz wrote: Hi everyone, I have a question on selecting and grouping elements of a data frame. For example: A.df- [ a c 0.9 b x 0.8 b z 0.5 c y 0.9 c x 0.7 c z 0.6] That is not R code. Matlab?, Python? I want to create a list of a data frame that gives me the unique values of column 1 of A.df so that i can create intersects. That is: B[a]- [ c 0.9] B[b]- [ x 0.8 z 0.5] B[c]- [ y 0.9 x 0.7 z 0.6] B[c] n B[b] - c(x,z) That's some sort of coded message? We are supposed to know what the n operation will do when assigned a vector? Assuming your really do have a dataframe named B: intersect(B$c, B$b) Please code up examples in R in the future. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping elements of a data frame
Hi, Try this: The last part was not clear. A.df-read.table(text= a c 0.9 b x 0.8 b z 0.5 c y 0.9 c x 0.7 c z 0.6 ,sep=,header=FALSE,stringsAsFactors=FALSE) lst1-split(A.df[,-1],A.df$V1) lst1 #$a # V2 V3 #1 c 0.9 # #$b # V2 V3 #2 x 0.8 #3 z 0.5 # #$c # V2 V3 #4 y 0.9 #5 x 0.7 #6 z 0.6 A.K. - Original Message - From: Nuri Alpay Temiz alpayte...@outlook.com To: R-help@r-project.org Cc: Sent: Tuesday, January 15, 2013 12:10 PM Subject: [R] grouping elements of a data frame Hi everyone, I have a question on selecting and grouping elements of a data frame. For example: A.df- [ a c 0.9 b x 0.8 b z 0.5 c y 0.9 c x 0.7 c z 0.6] I want to create a list of a data frame that gives me the unique values of column 1 of A.df so that i can create intersects. That is: B[a]- [ c 0.9] B[b]- [ x 0.8 z 0.5] B[c]- [ y 0.9 x 0.7 z 0.6] B[c] n B[b] - c(x,z) How can I accomplish this? Thanks, Al __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping distances
Hi R-listers, I am trying to group my HTL data, this is a column of data of Distances to the HTL data = turtlehatch. I would like to create an Index of distances (0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this HTLIndex in a column. So far I have gotten this far: HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)) { hope -values * weights return (apply(hope, 1, sum)/apply(values, 1, sum)) } write.csv(turtlehatch, HTLIndex, row.names=FALSE) But I do not seem to be able to create a new column in a new file. Please advise, Jean -- View this message in context: http://r.789695.n4.nabble.com/Grouping-distances-tp4632985.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping distances
Hi R-listers, I am trying to group my HTL data, this is a column of data of Distances to the HTL data = turtlehatch. I would like to create an Index of distances (0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this HTLIndex in a column. So far I have gotten this far: HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)) { hope -values * weights return (apply(hope, 1, sum)/apply(values, 1, sum)) } write.csv(turtlehatch, HTLIndex, row.names=FALSE) But I do not seem to be able to create a new column in a new file. Please advise, Jean -- View this message in context: http://r.789695.n4.nabble.com/Grouping-distances-tp4632984.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping distances
Hello, It's easy to create a new column. Since you haven't said where nor the type of data structure you are using, I'll try to answer to both. Suppose that 'x' s a matrix. Then newcolumn - newvalues x2 - cbind(x, newcolumn) # new column added to x, result in x2 Suppose that 'y' is a data.frame. Then the same would do it, or y$newcolumn - newvalues Now, I believe that the new values come from your function. If so, you must assign the function value to some variable outside the function. htlindex - HTL.index(...etc...) # 'htlindex' is the 'newvalues' above Two extra notes. One, rowSums() does what your apply() instructions do. Second, first you multiply then you divide, to give 'weights'. I think this is just an example, not the real function. Hope this helps, Rui Barradas Em 11-06-2012 07:01, Jhope escreveu: Hi R-listers, I am trying to group my HTL data, this is a column of data of Distances to the HTL data = turtlehatch. I would like to create an Index of distances (0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this HTLIndex in a column. So far I have gotten this far: HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)) { hope -values * weights return (apply(hope, 1, sum)/apply(values, 1, sum)) } write.csv(turtlehatch, HTLIndex, row.names=FALSE) But I do not seem to be able to create a new column in a new file. Please advise, Jean -- View this message in context: http://r.789695.n4.nabble.com/Grouping-distances-tp4632985.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping distances
Thank you Rui, I am trying to create a column in the data file turtlehatch.csv Saludos, Jean -- View this message in context: http://r.789695.n4.nabble.com/Grouping-distances-tp4632985p4632989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping function
Hello, I would like to write a function that makes a grouping variable for some panel data . The grouping variable is made conditional on the begin year and the end year. Here is the code I have written so far. name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5)); begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996)); end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000)); df - data.frame(name, begin, end); df; #This is the part I am stuck on; makegroup - function(x,y) { group - 0 if (x = 1990 y 1990) {group==1} if (x = 1991 y 1991) {group==2} if (x = 1992 y 1992) {group==3} return(x,y) } makegroup(df$begin,df$end); #I am looking for output where each observation belongs to a group conditional on the begin year and end year. I would also like to use a for loop for programming accuracy as well; Thank you! Geoff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping function
Hi, On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote: Hello, I would like to write a function that makes a grouping variable for some panel data . The grouping variable is made conditional on the begin year and the end year. Here is the code I have written so far. name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5)); begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996)); end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000)); df - data.frame(name, begin, end); df; Thanks for providing reproducible data. Two minor points: you don't need ; at the end of lines, and calling your data frame df is confusing because there's a df() function. #This is the part I am stuck on; makegroup - function(x,y) { group - 0 if (x = 1990 y 1990) {group==1} if (x = 1991 y 1991) {group==2} if (x = 1992 y 1992) {group==3} return(x,y) } makegroup(df$begin,df$end); #I am looking for output where each observation belongs to a group conditional on the begin year and end year. I would also like to use a for loop for programming accuracy as well; This isn't a clear specification: 1990, 1994 for instance fits into all three groups. Do you want to extend this to more start years, or are you only interested in those three? Assuming end is always = start, you don't even need to consider the end years in your grouping. Here are two methods, one that looks like your pseudocode, and one that is more R-ish. They give different results because of different handling of cases that fit all three groups. Rearranging the statements in makegroup1() from broadest to most restrictive would make it give the same result as makegroup2(). makegroup1 - function(x,y) { group - numeric(length(x)) group[x = 1990 y 1990] - 1 group[x = 1991 y 1991] - 2 group[x = 1992 y 1992] - 3 group } makegroup2 - function(x, y) { ifelse(x = 1990 y 1990, 1, ifelse(x = 1991 y 1991, 2, ifelse(x = 1992 y 1992, 3, 0))) } makegroup1(df$begin,df$end) [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0 makegroup2(df$begin,df$end) [1] 1 2 3 NA NA 2 3 NA NA NA 3 NA NA NA NA df But really, it's a better idea to develop an unambiguous statement of your desired output. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping function
Sorry, yes: I changed it before posting it to more closely match what the default value in the pseudocode. That's a very minor issue: the very last value in the nested ifelse() statements is what's used by default. Sarah On Tue, May 8, 2012 at 2:46 PM, arun smartpink...@yahoo.com wrote: HI Sarah, I run the same code from your reply email. For the makegroup2, the results are 0 in places of NA. makegroup1 - function(x,y) { + group - numeric(length(x)) + group[x = 1990 y 1990] - 1 + group[x = 1991 y 1991] - 2 + group[x = 1992 y 1992] - 3 + group + } makegroup2 - function(x, y) { + ifelse(x = 1990 y 1990, 1, + ifelse(x = 1991 y 1991, 2, + ifelse(x = 1992 y 1992, 3, 0))) + } makegroup1(df$begin,df$end) [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0 makegroup2(df$begin,df$end) [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0 A. K. - Original Message - From: Sarah Goslee sarah.gos...@gmail.com To: g...@asu.edu Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, May 8, 2012 2:33 PM Subject: Re: [R] grouping function Hi, On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote: Hello, I would like to write a function that makes a grouping variable for some panel data . The grouping variable is made conditional on the begin year and the end year. Here is the code I have written so far. name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5)); begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996)); end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000)); df - data.frame(name, begin, end); df; Thanks for providing reproducible data. Two minor points: you don't need ; at the end of lines, and calling your data frame df is confusing because there's a df() function. #This is the part I am stuck on; makegroup - function(x,y) { group - 0 if (x = 1990 y 1990) {group==1} if (x = 1991 y 1991) {group==2} if (x = 1992 y 1992) {group==3} return(x,y) } makegroup(df$begin,df$end); #I am looking for output where each observation belongs to a group conditional on the begin year and end year. I would also like to use a for loop for programming accuracy as well; This isn't a clear specification: 1990, 1994 for instance fits into all three groups. Do you want to extend this to more start years, or are you only interested in those three? Assuming end is always = start, you don't even need to consider the end years in your grouping. Here are two methods, one that looks like your pseudocode, and one that is more R-ish. They give different results because of different handling of cases that fit all three groups. Rearranging the statements in makegroup1() from broadest to most restrictive would make it give the same result as makegroup2(). makegroup1 - function(x,y) { group - numeric(length(x)) group[x = 1990 y 1990] - 1 group[x = 1991 y 1991] - 2 group[x = 1992 y 1992] - 3 group } makegroup2 - function(x, y) { ifelse(x = 1990 y 1990, 1, ifelse(x = 1991 y 1991, 2, ifelse(x = 1992 y 1992, 3, 0))) } makegroup1(df$begin,df$end) [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0 makegroup2(df$begin,df$end) [1] 1 2 3 NA NA 2 3 NA NA NA 3 NA NA NA NA df But really, it's a better idea to develop an unambiguous statement of your desired output. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping function
HI Sarah, I run the same code from your reply email. For the makegroup2, the results are 0 in places of NA. makegroup1 - function(x,y) { + group - numeric(length(x)) + group[x = 1990 y 1990] - 1 + group[x = 1991 y 1991] - 2 + group[x = 1992 y 1992] - 3 + group + } makegroup2 - function(x, y) { + ifelse(x = 1990 y 1990, 1, + ifelse(x = 1991 y 1991, 2, + ifelse(x = 1992 y 1992, 3, 0))) + } makegroup1(df$begin,df$end) [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0 makegroup2(df$begin,df$end) [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0 A. K. - Original Message - From: Sarah Goslee sarah.gos...@gmail.com To: g...@asu.edu Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, May 8, 2012 2:33 PM Subject: Re: [R] grouping function Hi, On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote: Hello, I would like to write a function that makes a grouping variable for some panel data . The grouping variable is made conditional on the begin year and the end year. Here is the code I have written so far. name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5)); begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996)); end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000)); df - data.frame(name, begin, end); df; Thanks for providing reproducible data. Two minor points: you don't need ; at the end of lines, and calling your data frame df is confusing because there's a df() function. #This is the part I am stuck on; makegroup - function(x,y) { group - 0 if (x = 1990 y 1990) {group==1} if (x = 1991 y 1991) {group==2} if (x = 1992 y 1992) {group==3} return(x,y) } makegroup(df$begin,df$end); #I am looking for output where each observation belongs to a group conditional on the begin year and end year. I would also like to use a for loop for programming accuracy as well; This isn't a clear specification: 1990, 1994 for instance fits into all three groups. Do you want to extend this to more start years, or are you only interested in those three? Assuming end is always = start, you don't even need to consider the end years in your grouping. Here are two methods, one that looks like your pseudocode, and one that is more R-ish. They give different results because of different handling of cases that fit all three groups. Rearranging the statements in makegroup1() from broadest to most restrictive would make it give the same result as makegroup2(). makegroup1 - function(x,y) { group - numeric(length(x)) group[x = 1990 y 1990] - 1 group[x = 1991 y 1991] - 2 group[x = 1992 y 1992] - 3 group } makegroup2 - function(x, y) { ifelse(x = 1990 y 1990, 1, ifelse(x = 1991 y 1991, 2, ifelse(x = 1992 y 1992, 3, 0))) } makegroup1(df$begin,df$end) [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0 makegroup2(df$begin,df$end) [1] 1 2 3 NA NA 2 3 NA NA NA 3 NA NA NA NA df But really, it's a better idea to develop an unambiguous statement of your desired output. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping and/or splitting
On 04-04-2012, at 07:15, Ashish Agarwal wrote: Yes. I was missing the DROP argument. But now the problem is splitting is causing some weird ordering of groups. Why weird? See below: DF - read.table(text= Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 3,1,5,7 , header=TRUE, sep=,) aa - split(DF, DF[, 1:2], drop=TRUE) Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I preserve the ascending order? Try this aa[order(names(aa))] Berend aa[3] $`3.1` Houseid Personid Tripid taz 7 31 5 7 aa[4] $`2.2` Houseid Personid Tripid taz 6 22 1 58 On Wed, Apr 4, 2012 at 6:29 AM, Rui Barradas rui1...@sapo.pt wrote: Hello, Ashish Agarwal wrote I have a dataframe imported from csv file below: Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 There are three groups identified based on the combination of first and second columns. How do I split this data frame? I tried aa - split(inpfil, inpfil[,1:2]) but it has problems. Output desired is aa[1] Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 aa[2] Houseid,Personid,Tripid,taz 2,1,1,96 2,1,2,4 2,1,3,2 aa[3] Houseid,Personid,Tripid,taz 2,2,1,58 [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Any of the following three works with me. DF - read.table(text= Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 , header=TRUE, sep=,) DF split(DF, DF[, 1:2], drop=TRUE) split(DF, list(DF$Houseid, DF$Personid), drop=TRUE) with(DF, split(DF, list(Houseid, Personid), drop=TRUE)) The argument 'drop' defaults to FALSE. Was that the problem? Hope this helps, Rui Barrada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping and/or splitting
Thanks a ton! It was weird because according to me ordering should have by default. Anyways, your workaround along with Weidong's method are both good solutions. On Wed, Apr 4, 2012 at 12:10 PM, Berend Hasselman b...@xs4all.nl wrote: On 04-04-2012, at 07:15, Ashish Agarwal wrote: Yes. I was missing the DROP argument. But now the problem is splitting is causing some weird ordering of groups. Why weird? See below: DF - read.table(text= Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 3,1,5,7 , header=TRUE, sep=,) aa - split(DF, DF[, 1:2], drop=TRUE) Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I preserve the ascending order? Try this aa[order(names(aa))] Berend aa[3] $`3.1` Houseid Personid Tripid taz 7 31 5 7 aa[4] $`2.2` Houseid Personid Tripid taz 6 22 1 58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping
Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
On Apr 3, 2012, at 8:47 AM, Val wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) The methods below do not require a sorting step. I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 I hope those weren't answers from SAS. Can anyone help me out? I usually do this with Hmisc::cut2 since it has a `g = n` parameter that auto-magically calls the quantile splitting criterion but this is done in base R. split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ) $`[36,65.9]` [1] 36 45 46 $`(65.9,189]` [1] 66 78 125 $`(189,297]` [1] 193 209 242 297 lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) $`[36,65.9]` [1] 42.3 $`(65.9,189]` [1] 89.7 $`(189,297]` [1] 235.25 Or to get a table instead of a list: tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) , mean) [36,65.9] (65.9,189] (189,297] 42.3 89.7 235.25000 In SAS I used to do it using proc rank. ?quantile isn't equivalent to Proc Rank but it will provide a useful basis for splitting or tabling functions. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Probably something along the following lines: x - c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sorted - c(36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) tapply(sorted, INDEX = (seq_along(sorted) - 1) %/% 3, FUN = mean) 0 1 2 3 42.3 89.7 214.7 297.0 Hope this helps, Giovanni On Tue, 2012-04-03 at 08:47 -0400, Val wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Giovanni Petris gpet...@uark.edu Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Hi! Maybe not the most elegant solution, but works: for(i in seq(1,length(data)-(length(data) %% 3), 3)) { ifelse((length(data)-i)3, { print(sort(data)[ c(i:(i+2)) ]); print(mean(sort(data)[ c(i:(i+2)) ])) }, { print(sort(data)[ c(i:length(data)) ]); print(mean(sort(data)[ c(i:length(data)) ])) } ) } Produces: [1] 36 45 46 [1] 42.3 [1] 66 78 125 [1] 89.7 [1] 193 209 242 297 [1] 235.25 HTH, Kimmo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, .66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Thanks On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Use cut2 as I suggested and David demonstrated. Michael On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote: Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, .66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Thanks On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
On Tue, Apr 03, 2012 at 09:31:29AM -0400, Val wrote: Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, .66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Hi. There may be better ways for the whole task, but specifically c(0, .333, .66 ,1) can be obtained as seq(0, 1, length=3+1) Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote: Use cut2 as I suggested and David demonstrated. Agree that Hmisc::cut2 is extremely handy and I also like that fact that the closed ends of intervals are on the left side (which is not the same behavior as cut()), which has the otehr effect of setting include.lowest = TRUE which is not the default for cut() either (to my continued amazement). But let me add the method I use when doing it by hand: cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE) -- David. Michael On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote: Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, . 66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Thanks On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Or just replace c(0, .333, .667, 1) with n - 10 split(x, cut(x, quantile(x, prob= c(0, 1:(n-1)/n, 1)), include.lowest=TRUE)) where n is the number of groups you want. -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of R. Michael Weylandt Sent: Tuesday, April 03, 2012 8:32 AM To: Val Cc: r-help@r-project.org Subject: Re: [R] grouping Use cut2 as I suggested and David demonstrated. Michael On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote: Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, .66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Thanks On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
David W and all, Thank you very much for your help. Here is the final output that I want in the form of data frame. The data frame should contain x, group and group_ mean in the following way x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 Thanks a lot On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius dwinsem...@comcast.netwrote: On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote: Use cut2 as I suggested and David demonstrated. Agree that Hmisc::cut2 is extremely handy and I also like that fact that the closed ends of intervals are on the left side (which is not the same behavior as cut()), which has the otehr effect of setting include.lowest = TRUE which is not the default for cut() either (to my continued amazement). But let me add the method I use when doing it by hand: cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE) -- David. Michael On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote: Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, .66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Thanks On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
On Apr 3, 2012, at 10:11 AM, Val wrote: David W and all, Thank you very much for your help. Here is the final output that I want in the form of data frame. The data frame should contain x, group and group_ mean in the following way x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I you want group means in a vector the same length as x then instead of using tapply as done in earlier solutions you should use `ave`. -- DW Thanks a lot On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius dwinsem...@comcast.net wrote: On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote: Use cut2 as I suggested and David demonstrated. Agree that Hmisc::cut2 is extremely handy and I also like that fact that the closed ends of intervals are on the left side (which is not the same behavior as cut()), which has the otehr effect of setting include.lowest = TRUE which is not the default for cut() either (to my continued amazement). But let me add the method I use when doing it by hand: cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE) -- David. Michael On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote: Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, . 66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Thanks On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Hi All, On the same data points x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) I want to have have the following output as data frame x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I tried the following code dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1 gxc - with(dat, tapply(xc, group, mean)) dat$gxc - gxce[as.character(dat$group)] txc=dat$gxc it did not work for me. On Tue, Apr 3, 2012 at 10:15 AM, David Winsemius dwinsem...@comcast.netwrote: On Apr 3, 2012, at 10:11 AM, Val wrote: David W and all, Thank you very much for your help. Here is the final output that I want in the form of data frame. The data frame should contain x, group and group_ mean in the following way x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I you want group means in a vector the same length as x then instead of using tapply as done in earlier solutions you should use `ave`. -- DW Thanks a lot On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius dwinsem...@comcast.netwrote: On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote: Use cut2 as I suggested and David demonstrated. Agree that Hmisc::cut2 is extremely handy and I also like that fact that the closed ends of intervals are on the left side (which is not the same behavior as cut()), which has the otehr effect of setting include.lowest = TRUE which is not the default for cut() either (to my continued amazement). But let me add the method I use when doing it by hand: cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE) -- David. Michael On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote: Thank you all (David, Michael, Giovanni) for your prompt response. First there was a typo error for the group mean it was 89.6 not 87. For a small data set and few groupings I can use prob=c(0, .333, .66 ,1) to group in to three groups in this case. However, if I want to extend the number of groupings say 10 or 15 then do I have to figure it out the split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) Is there a short cut for that? Thanks On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote: Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT
Re: [R] grouping
On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote: Hi All, On the same data points x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) I want to have have the following output as data frame x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I tried the following code dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1 gxc - with(dat, tapply(xc, group, mean)) dat$gxc - gxce[as.character(dat$group)] txc=dat$gxc it did not work for me. David Winsemius suggested to use ave(), when you asked this question for the first time. Can you have look at it? Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
I did look at it the result is below, x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) #lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) $`[36,74]` [1] NA $`(74,197]` [1] NA $`(197,297]` [1] NA There were 11 warnings (use warnings() to see them) On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky savi...@cs.cas.cz wrote: On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote: Hi All, On the same data points x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) I want to have have the following output as data frame x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I tried the following code dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1 gxc - with(dat, tapply(xc, group, mean)) dat$gxc - gxce[as.character(dat$group)] txc=dat$gxc it did not work for me. David Winsemius suggested to use ave(), when you asked this question for the first time. Can you have look at it? Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman b...@xs4all.nl wrote: On 03-04-2012, at 20:21, Val wrote: Hi All, On the same data points x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) I want to have have the following output as data frame x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I tried the following code dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1 gxc - with(dat, tapply(xc, group, mean)) dat$gxc - gxce[as.character(dat$group)] txc=dat$gxc it did not work for me. I'm not surprised. In the line dat - there are 5 opening parentheses and 4 closing )'s. In the line dat$gxc - you reference an object gxce. Where was it created? So I tried this dat - data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 ,1)), all.inside=TRUE)) dat$gmean - ave(dat$x, as.factor(dat$group)) dat x group gmean 1 46 1 42.3 2 125 2 89.7 3 36 1 42.3 4 193 3 235.25000 5 209 3 235.25000 6 78 2 89.7 7 66 2 89.7 8 242 3 235.25000 9 297 3 235.25000 10 45 1 42.3 Thank you very much. It is working now. there was a type error on gxce. But in the r-code it was correct, gxc.. Berend [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
On 03-04-2012, at 21:02, Val wrote: On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman b...@xs4all.nl wrote: On 03-04-2012, at 20:21, Val wrote: Hi All, On the same data points x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) I want to have have the following output as data frame x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I tried the following code dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1 gxc - with(dat, tapply(xc, group, mean)) dat$gxc - gxce[as.character(dat$group)] txc=dat$gxc it did not work for me. I'm not surprised. In the line dat - there are 5 opening parentheses and 4 closing )'s. In the line dat$gxc - you reference an object gxce. Where was it created? So I tried this dat - data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 ,1)), all.inside=TRUE)) dat$gmean - ave(dat$x, as.factor(dat$group)) And the as.factor is not necessary. This will do dat$gmean - ave(dat$x, dat$group) Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
On 03-04-2012, at 20:21, Val wrote: Hi All, On the same data points x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) I want to have have the following output as data frame x group group mean 46 142.3 125 289.6 36 142.3 193 3235.25 209 3235.25 78 289.6 66 289.6 242 3235.25 297 3235.25 45 142.3 I tried the following code dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1 gxc - with(dat, tapply(xc, group, mean)) dat$gxc - gxce[as.character(dat$group)] txc=dat$gxc it did not work for me. I'm not surprised. In the line dat - there are 5 opening parentheses and 4 closing )'s. In the line dat$gxc - you reference an object gxce. Where was it created? So I tried this dat - data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 ,1)), all.inside=TRUE)) dat$gmean - ave(dat$x, as.factor(dat$group)) dat x group gmean 1 46 1 42.3 2 125 2 89.7 3 36 1 42.3 4 193 3 235.25000 5 209 3 235.25000 6 78 2 89.7 7 66 2 89.7 8 242 3 235.25000 9 297 3 235.25000 10 45 1 42.3 Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping
Please take a look at my first reply to you: ave(y, findInterval(y, quantile(y, c(0.33, 0.66 Then read ?ave for an explanation of the syntax. ave takes two vectors, the first being the data to be averaged, the second being an index to split by. You don't want to use split() here. Michael On Tue, Apr 3, 2012 at 2:50 PM, Val valkr...@gmail.com wrote: I did look at it the result is below, x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) #lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) $`[36,74]` [1] NA $`(74,197]` [1] NA $`(197,297]` [1] NA There were 11 warnings (use warnings() to see them) On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky savi...@cs.cas.cz wrote: On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote: Hi All, On the same data points x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) I want to have have the following output as data frame x group group mean 46 1 42.3 125 2 89.6 36 1 42.3 193 3 235.25 209 3 235.25 78 2 89.6 66 2 89.6 242 3 235.25 297 3 235.25 45 1 42.3 I tried the following code dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1 gxc - with(dat, tapply(xc, group, mean)) dat$gxc - gxce[as.character(dat$group)] txc=dat$gxc it did not work for me. David Winsemius suggested to use ave(), when you asked this question for the first time. Can you have look at it? Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping and/or splitting
I have a dataframe imported from csv file below: Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 There are three groups identified based on the combination of first and second columns. How do I split this data frame? I tried aa - split(inpfil, inpfil[,1:2]) but it has problems. Output desired is aa[1] Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 aa[2] Houseid,Personid,Tripid,taz 2,1,1,96 2,1,2,4 2,1,3,2 aa[3] Houseid,Personid,Tripid,taz 2,2,1,58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping and/or splitting
how about split(inpfil, paste(inpfil[,1],inpfil[,2],sep=',')) Weidong Gu On Tue, Apr 3, 2012 at 6:42 PM, Ashish Agarwal ashish.agarw...@gmail.com wrote: I have a dataframe imported from csv file below: Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 There are three groups identified based on the combination of first and second columns. How do I split this data frame? I tried aa - split(inpfil, inpfil[,1:2]) but it has problems. Output desired is aa[1] Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 aa[2] Houseid,Personid,Tripid,taz 2,1,1,96 2,1,2,4 2,1,3,2 aa[3] Houseid,Personid,Tripid,taz 2,2,1,58 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping and/or splitting
Hello, Ashish Agarwal wrote I have a dataframe imported from csv file below: Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 There are three groups identified based on the combination of first and second columns. How do I split this data frame? I tried aa - split(inpfil, inpfil[,1:2]) but it has problems. Output desired is aa[1] Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 aa[2] Houseid,Personid,Tripid,taz 2,1,1,96 2,1,2,4 2,1,3,2 aa[3] Houseid,Personid,Tripid,taz 2,2,1,58 [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Any of the following three works with me. DF - read.table(text= Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 , header=TRUE, sep=,) DF split(DF, DF[, 1:2], drop=TRUE) split(DF, list(DF$Houseid, DF$Personid), drop=TRUE) with(DF, split(DF, list(Houseid, Personid), drop=TRUE)) The argument 'drop' defaults to FALSE. Was that the problem? Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Grouping-and-or-splitting-tp4530410p4530624.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping and/or splitting
Yes. I was missing the DROP argument. But now the problem is splitting is causing some weird ordering of groups. See below: DF - read.table(text= Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 3,1,5,7 , header=TRUE, sep=,) aa - split(DF, DF[, 1:2], drop=TRUE) Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I preserve the ascending order? aa[3] $`3.1` Houseid Personid Tripid taz 7 31 5 7 aa[4] $`2.2` Houseid Personid Tripid taz 6 22 1 58 On Wed, Apr 4, 2012 at 6:29 AM, Rui Barradas rui1...@sapo.pt wrote: Hello, Ashish Agarwal wrote I have a dataframe imported from csv file below: Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 There are three groups identified based on the combination of first and second columns. How do I split this data frame? I tried aa - split(inpfil, inpfil[,1:2]) but it has problems. Output desired is aa[1] Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 aa[2] Houseid,Personid,Tripid,taz 2,1,1,96 2,1,2,4 2,1,3,2 aa[3] Houseid,Personid,Tripid,taz 2,2,1,58 [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Any of the following three works with me. DF - read.table(text= Houseid,Personid,Tripid,taz 1,1,1,4 1,1,2,7 2,1,1,96 2,1,2,4 2,1,3,2 2,2,1,58 , header=TRUE, sep=,) DF split(DF, DF[, 1:2], drop=TRUE) split(DF, list(DF$Houseid, DF$Personid), drop=TRUE) with(DF, split(DF, list(Houseid, Personid), drop=TRUE)) The argument 'drop' defaults to FALSE. Was that the problem? Hope this helps, Rui Barrada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping together a time variable
I have the following variable, time, which is a character variable and it's structured as follows. head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 00:03:22 [12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09 00:05:17 00:05:19 00:05:21 00:05:22 00:05:22 [23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10 00:08:15 00:08:26 What I am trying to do is group the data into one hour increment. So 5:01-6:00am, 6:01-7:00am, 7:01-8:00a, and so forth. However, I'm not sure if there's a simple route to do this in R or how to do it. Can anyone point me in the right direction? -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping together a time variable
Perhaps cut.POSIXt (which is a generic so you can just call cut) depending on the unstated form of your time object. Michael On Thu, Feb 9, 2012 at 12:15 PM, Abraham Mathew abmathe...@gmail.com wrote: I have the following variable, time, which is a character variable and it's structured as follows. head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 00:03:22 [12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09 00:05:17 00:05:19 00:05:21 00:05:22 00:05:22 [23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10 00:08:15 00:08:26 What I am trying to do is group the data into one hour increment. So 5:01-6:00am, 6:01-7:00am, 7:01-8:00a, and so forth. However, I'm not sure if there's a simple route to do this in R or how to do it. Can anyone point me in the right direction? -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping miliseconds By Hours
I have a list of numbers corresponding to timestamps, a sample of which follows: c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042, 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527, 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588, 1328213951, 1328236836, 1328300276, 1328335936, 1328429102) I would like to group these into hours. In other words, something like: c( 2012-01-31 21:14:26 PST 2012-02-01 20:25:21 PST 2012-02-02 04:39:48 PST 2012-02-02 20:19:11 PST 2012-02-03 02:40:36 PST 2012-02-03 20:17:56 PST 2012-02-04 06:12:16 PST 2012-02-05 08:05:02 PST) Hour Hits 21 1 20 3 41 21 61 81 How would I do this without too much pain (from a CPU perspective)? This is a subset of a million entries and I would rather not go through these manually... So, any advice? Many thanks! -- H -- Sent from my mobile device Envoyait de mon portable __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping miliseconds By Hours
Is this what you are after: x - c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042, + 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527, + 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588, + 1328213951, 1328236836, 1328300276, 1328335936, 1328429102) x - as.POSIXct(x, origin = '1970-1-1') x [1] 2012-01-22 05:49:18 EST 2012-01-22 08:46:39 EST 2012-01-25 21:34:56 EST [4] 2012-01-26 05:23:53 EST 2012-01-27 21:50:42 EST 2012-01-28 14:36:29 EST [7] 2012-01-28 20:03:13 EST 2012-01-29 05:41:10 EST 2012-01-29 07:42:44 EST [10] 2012-01-30 04:24:57 EST 2012-01-30 04:25:27 EST 2012-01-30 15:24:32 EST [13] 2012-01-30 15:45:00 EST 2012-01-30 21:06:29 EST 2012-01-31 21:14:26 EST [16] 2012-02-01 20:25:21 EST 2012-02-02 04:39:48 EST 2012-02-02 20:19:11 EST [19] 2012-02-03 02:40:36 EST 2012-02-03 20:17:56 EST 2012-02-04 06:12:16 EST [22] 2012-02-05 08:05:02 EST table(format(x, %H)) 02 04 05 06 07 08 14 15 20 21 1 3 3 1 1 2 1 2 4 4 On Sun, Feb 5, 2012 at 4:54 AM, Hasan Diwan hasan.di...@gmail.com wrote: I have a list of numbers corresponding to timestamps, a sample of which follows: c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042, 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527, 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588, 1328213951, 1328236836, 1328300276, 1328335936, 1328429102) I would like to group these into hours. In other words, something like: c( 2012-01-31 21:14:26 PST 2012-02-01 20:25:21 PST 2012-02-02 04:39:48 PST 2012-02-02 20:19:11 PST 2012-02-03 02:40:36 PST 2012-02-03 20:17:56 PST 2012-02-04 06:12:16 PST 2012-02-05 08:05:02 PST) Hour Hits 21 1 20 3 4 1 2 1 6 1 8 1 How would I do this without too much pain (from a CPU perspective)? This is a subset of a million entries and I would rather not go through these manually... So, any advice? Many thanks! -- H -- Sent from my mobile device Envoyait de mon portable __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping miliseconds By Hours
On Feb 5, 2012, at 9:54 AM, jim holtman wrote: Is this what you are after: x - c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042, + 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527, + 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588, + 1328213951, 1328236836, 1328300276, 1328335936, 1328429102) x - as.POSIXct(x, origin = '1970-1-1') x [1] 2012-01-22 05:49:18 EST 2012-01-22 08:46:39 EST 2012-01-25 21:34:56 EST [4] 2012-01-26 05:23:53 EST 2012-01-27 21:50:42 EST 2012-01-28 14:36:29 EST [7] 2012-01-28 20:03:13 EST 2012-01-29 05:41:10 EST 2012-01-29 07:42:44 EST [10] 2012-01-30 04:24:57 EST 2012-01-30 04:25:27 EST 2012-01-30 15:24:32 EST [13] 2012-01-30 15:45:00 EST 2012-01-30 21:06:29 EST 2012-01-31 21:14:26 EST [16] 2012-02-01 20:25:21 EST 2012-02-02 04:39:48 EST 2012-02-02 20:19:11 EST [19] 2012-02-03 02:40:36 EST 2012-02-03 20:17:56 EST 2012-02-04 06:12:16 EST [22] 2012-02-05 08:05:02 EST table(format(x, %H)) 02 04 05 06 07 08 14 15 20 21 1 3 3 1 1 2 1 2 4 4 It's possible that you may not realize that jim holman has implicitly given you a handle on doing operations on such groups, since you could use the value of format(x. %H) as the indexing argument in tapply, ave, or aggregate. -- David. On Sun, Feb 5, 2012 at 4:54 AM, Hasan Diwan hasan.di...@gmail.com wrote: I have a list of numbers corresponding to timestamps, a sample of which follows: c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042, 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527, 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588, 1328213951, 1328236836, 1328300276, 1328335936, 1328429102) I would like to group these into hours. In other words, something like: c( 2012-01-31 21:14:26 PST 2012-02-01 20:25:21 PST 2012-02-02 04:39:48 PST 2012-02-02 20:19:11 PST 2012-02-03 02:40:36 PST 2012-02-03 20:17:56 PST 2012-02-04 06:12:16 PST 2012-02-05 08:05:02 PST) Hour Hits 21 1 20 3 41 21 61 81 How would I do this without too much pain (from a CPU perspective)? This is a subset of a million entries and I would rather not go through these manually... So, any advice? Many thanks! -- H -- Sent from my mobile device Envoyait de mon portable __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping clusters from dendrograms
Hi Julia, sorry for the very late reply, your original email was posted while I was on hiatus from R-help. I'm the author of the dynamicTreeCut package. I recommend that you try using the hybrid method using the cutreeDynamic function. What you observed is a known problem of the tree method (which, by the way, was the reason I developed the Hybrid method). Using the hybrid method is simple, for example as cut2-cutreeDynamic(dendro,distM = combo2, maxTreeHeight=1,deepSplit=2,minModuleSize=1) You can play with the argument deepSplit to obtain finer or coarser modules. HTH, Peter -- View this message in context: http://r.789695.n4.nabble.com/Grouping-clusters-from-dendrograms-tp2316521p3988526.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping variables in a data frame
On Sat, Aug 27, 2011 at 7:26 AM, Andra Isan andra_i...@yahoo.com wrote: Hi All, I have a data frame as follow: user_id time age location gender . and I learn a logistic regression to learn the weights (glm with family= (link = logit))), my response value is either zero or one. I would like to group the users based on user_id and time and see the y values and predicted y values at the same time. Or plot them some how. Is there any way to somehow group them together so that I can learn more about my data by grouping them? It's very difficult to help you because you haven't followed the posting guide. But I suspect you're looking for the following: require(plyr) Loading required package: plyr data(mtcars) ##considering 'gear' as 'id' and 'carb' as time ddply(mtcars, .(gear, carb), function(x) mean(x$hp)) gear carbV1 1 31 104.0 2 32 162.5 3 33 180.0 4 34 228.0 5 41 72.5 6 42 79.5 7 44 116.5 8 52 102.0 9 54 264.0 1056 175.0 1158 335.0 This will compute the mean of 'hp' for each group of id time. Liviu I would like to get these at the end user_id time y predicted_y Thanks a lot, Andra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping variables in a data frame
Hi All, I have a data frame as follow: user_id time age location gender . and I learn a logistic regression to learn the weights (glm with family= (link = logit))), my response value is either zero or one. I would like to group the users based on user_id and time and see the y values and predicted y values at the same time. Or plot them some how. Is there any way to somehow group them together so that I can learn more about my data by grouping them? I would like to get these at the end user_id time y predicted_y Thanks a lot, Andra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping columns
Hi @ all, both possibilities are working very fine. Thanks a lot for the fast help! Best Greetinx from the Earth Eater Geophagus -- View this message in context: http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3683076.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping data
adolfpf wrote: How do I group my data in dolf the same way the data Orthodont are grouped. show(dolf) distance age Subjectt Sex 16.83679 22.01 F1 F 26.63245 23.04 F1 F 3 11.58730 39.26 M2 M I know that many sample in that excellent book use grouped data, but the concept of grouped data is more confusing than helpful. I only got started using nlme/lme when I realized that everything could be done without grouped data. Too bad, many examples in Pinheiro/Bates rely on the concept (but no longer do in the coing lme4). So I suggest that you try to solve the problem with vanilla data frames instead of grouped ones. In most cases, it only means that you have to put the formula into the lme(..) call instead of relying on some hidden defaults. Dieter -- View this message in context: http://r.789695.n4.nabble.com/grouping-data-tp3679803p3680115.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping data
All the examples in 'nlme' are in Grouped Data: distance ~ age | Subject format. How do I group my data in dolf the same way the data Orthodont are grouped. show(dolf) distance age Subjectt Sex 16.83679 22.01 F1 F 26.63245 23.04 F1 F 3 11.58730 39.26 M2 M show(Orthodont) Grouped Data: distance ~ age | Subject distance age SubjectSex 1 26.0 8 M01 Male 2 25.0 10 M01 Male 3 29.0 12 M01 Male -- View this message in context: http://r.789695.n4.nabble.com/grouping-data-tp3679803p3679803.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping columns
untested because I don't have access to your data, but this should work. b13.NEW - b13[, c(Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht gefährliche Abfälle)] Geophagus wrote: *Hi @ all, I have a question concerning the possibilty of grouping the columns of a matrix. R groups the columns alphabetically. What can I do to group the columns in my specifications? The script is the following:* #R-Skript: Anzahl xyz #Quelldatei einlesen b-read.csv2(Z:/int/xyz.csv, header=TRUE) #Teilmengen für die Einzeljahre generieren b1-subset(b,jahr==2007) b2-subset(b,jahr==2008) b3-subset(b,jahr==2009) #tapply für die Einzeljahre auf die jeweilige BranchenID b1_1-tapply(b1$betriebs_id,b1$umweltkompartiment,length) b1_2-tapply(b2$betriebs_id,b2$umweltkompartiment,length) b1_3-tapply(b3$betriebs_id,b3$umweltkompartiment,length) #Verbinden der Ergebnisse b11-rbind(b1_1,b1_2,b1_3) Gesamt-apply(X=b11,MARGIN=1, sum) b13-cbind(Gesamt,b11) b13 Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche Abfälle Wasser b1_1 9832 432183147 2839 1592 1804 b1_2 10271 413283360 2920 1715 1835 b1_3 9983 404213405 2741 1691 1721 *Now I want to have the following order of the columns: Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht gefährliche Abfälle Thanks a lot for your answers! Fak* -- View this message in context: http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681121.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping columns
*Hi @ all, I have a question concerning the possibilty of grouping the columns of a matrix. R groups the columns alphabetically. What can I do to group the columns in my specifications? The script is the following:* #R-Skript: Anzahl xyz #Quelldatei einlesen b-read.csv2(Z:/int/xyz.csv, header=TRUE) #Teilmengen für die Einzeljahre generieren b1-subset(b,jahr==2007) b2-subset(b,jahr==2008) b3-subset(b,jahr==2009) #tapply für die Einzeljahre auf die jeweilige BranchenID b1_1-tapply(b1$betriebs_id,b1$umweltkompartiment,length) b1_2-tapply(b2$betriebs_id,b2$umweltkompartiment,length) b1_3-tapply(b3$betriebs_id,b3$umweltkompartiment,length) #Verbinden der Ergebnisse b11-rbind(b1_1,b1_2,b1_3) Gesamt-apply(X=b11,MARGIN=1, sum) b13-cbind(Gesamt,b11) b13 Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche Abfälle Wasser b1_1 9832 432183147 2839 1592 1804 b1_2 10271 413283360 2920 1715 1835 b1_3 9983 404213405 2741 1691 1721 *Now I want to have the following order of the columns: Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht gefährliche Abfälle Thanks a lot for your answers! Fak* -- View this message in context: http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681018.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping columns
On Jul 20, 2011, at 10:42 AM, Geophagus wrote: *Hi @ all, I have a question concerning the possibilty of grouping the columns of a matrix. R groups the columns alphabetically. What can I do to group the columns in my specifications? Dear Earth Eater; You can create a factor whose levels are ordered to your specification. Your columns: umweltkompartiment obviously has those levels. This might also offer advantages in situations where there was not complete representation of all levels in all the files So your tapply() calls could have been of this form: b1_1-tapply(b1$betriebs_id, factor( b1$umweltkompartiment, levels= c(Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht gefährliche Abfälle) ) ,length) # code would e more compact if you created a facvtor vector and use it as an argument to factor: faclevs - c(Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht gefährliche Abfälle) b1_1-tapply(b1$betriebs_id, factor( b1$umweltkompartiment, levels= faclev ) ,length) lather, rinse, repeat x 3 -- David. The script is the following:* #R-Skript: Anzahl xyz #Quelldatei einlesen b-read.csv2(Z:/int/xyz.csv, header=TRUE) #Teilmengen für die Einzeljahre generieren b1-subset(b,jahr==2007) b2-subset(b,jahr==2008) b3-subset(b,jahr==2009) #tapply für die Einzeljahre auf die jeweilige BranchenID b1_1-tapply(b1$betriebs_id,b1$umweltkompartiment,length) b1_2-tapply(b2$betriebs_id,b2$umweltkompartiment,length) b1_3-tapply(b3$betriebs_id,b3$umweltkompartiment,length) #Verbinden der Ergebnisse b11-rbind(b1_1,b1_2,b1_3) Gesamt-apply(X=b11,MARGIN=1, sum) b13-cbind(Gesamt,b11) b13 Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche Abfälle Wasser b1_1 9832 432183147 2839 1592 1804 b1_2 10271 413283360 2920 1715 1835 b1_3 9983 404213405 2741 1691 1721 *Now I want to have the following order of the columns: Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht gefährliche Abfälle Thanks a lot for your answers! Fak* -- View this message in context: http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681018.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping data in ranges in table
Working with the built in R data set Orange, e.g. with(Orange, table(age, circumference)). How should I go about about grouping the ages and circumferences in the following ranges and having them display as such in a table? age range: 118 - 664 1004 - 1372 1582 circumference range: 30-58 62- 115 120-142 145-177 179-214 Thanks for any feedback and insights, as I hoping for an output that looks something like the following: circumference range 30-58 62- 115 145-177 age range 118 - 664 ... 1004 - 1372 ... 1582 Thanks a ton. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping data in ranges in table
?cut -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Jason Rupert Sent: Saturday, March 05, 2011 3:38 PM To: R Project Help Subject: [R] Grouping data in ranges in table Working with the built in R data set Orange, e.g. with(Orange, table(age, circumference)). How should I go about about grouping the ages and circumferences in the following ranges and having them display as such in a table? age range: 118 - 664 1004 - 1372 1582 circumference range: 30-58 62- 115 120-142 145-177 179-214 Thanks for any feedback and insights, as I hoping for an output that looks something like the following: circumference range 30-58 62- 115 145-177 age range 118 - 664 ... 1004 - 1372 ... 1582 Thanks a ton. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping data in ranges in table
Hi Jason, Something along the lines of with(Orange, table(cut(age, breaks = c(118, 664, 1004, 1372, 1582, Inf)), cut(circumference, breaks = c(30, 58, 62, 115, 145, 179, 214 should get you started. HTH, Jorge On Sat, Mar 5, 2011 at 5:38 PM, Jason Rupert wrote: Working with the built in R data set Orange, e.g. with(Orange, table(age, circumference)). How should I go about about grouping the ages and circumferences in the following ranges and having them display as such in a table? age range: 118 - 664 1004 - 1372 1582 circumference range: 30-58 62- 115 120-142 145-177 179-214 Thanks for any feedback and insights, as I hoping for an output that looks something like the following: circumference range 30-58 62- 115 145-177 age range 118 - 664 ... 1004 - 1372 ... 1582 Thanks a ton. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping data
Hi R-list, I have a data set with plot locations and observations and want to label them based on locations. For example, I have GPS information (x and y) as follows: x [1] -87.85092 -87.85092 -87.85092 -87.85093 -87.85093 -87.85093 -87.85094 [8] -87.85094 -87.85094 -87.85096 -87.85095 -87.85095 -87.85095 -87.85096 [15] -87.85096 -87.85096 -87.85096 -87.85088 -87.85088 -87.85087 -87.85087 [22] -87.85087 -87.85087 -87.85086 -87.85086 -87.85086 -87.85085 -87.85086 [29] -87.85085 -87.85085 -87.85084 -87.85084 -87.85084 -87.85084 -87.85075 [36] -87.85075 -87.85076 -87.85076 -87.85077 -87.85076 -87.85076 -87.85076 [43] -87.85077 -87.85077 -87.85077 -87.85077 -87.85077 -87.85077 -87.85070 [50] -87.85072 -87.85073 -87.85075 -87.85078 -87.85079 -87.85082 -87.85084 [57] -87.85077 -87.85078 -87.85078 -87.85078 -87.85078 -87.85078 -87.85079 [64] -87.85079 -87.85080 -87.85080 -87.85071 -87.85071 -87.85071 -87.85070 [71] -87.85071 -87.85079 -87.85071 -87.85070 -87.85070 -87.85069 -87.85069 [78] -87.85069 -87.85069 -87.85068 -87.85068 -87.85068 -87.85067 -87.85059 [85] -87.85060 -87.85060 -87.85060 -87.85061 -87.85061 -87.85061 -87.85061 [92] -87.85061 -87.85062 -87.85062 -87.85062 -87.85062 -87.85063 -87.85063 [99] -87.85063 -87.85055 -87.85055 -87.85055 -87.85054 -87.85054 -87.85053 [106] -87.85053 -87.85053 -87.85053 -87.85053 -87.85052 -87.85052 -87.85052 [113] -87.85052 -87.85051 -87.85051 -87.85043 -87.85043 -87.85044 -87.85044 [120] -87.85044 -87.85045 -87.85045 -87.85045 -87.85045 -87.85046 -87.85046 [127] -87.85046 -87.85046 -87.85047 -87.85047 -87.85039 -87.85039 -87.85038 [134] -87.85038 -87.85038 -87.85037 -87.85037 -87.85037 -87.85037 -87.85036 [141] -87.85036 -87.85036 -87.85035 -87.85035 -87.85035 -87.85027 -87.85027 [148] -87.85027 -87.85027 -87.85028 -87.85028 -87.85028 -87.85029 -87.85029 [155] -87.85029 -87.85029 -87.85029 -87.85030 -87.85030 -87.85030 -87.85022 [162] -87.85022 -87.85022 -87.85021 -87.85021 -87.85021 -87.85020 -87.85020 [169] -87.85020 -87.85020 -87.85019 -87.85019 -87.85019 -87.85019 -87.85011 [176] -87.85011 -87.85011 -87.85011 -87.85012 -87.85012 -87.85012 -87.85012 [183] -87.85013 -87.85013 -87.85013 -87.85014 -87.85014 -87.85014 -87.85006 [190] -87.85006 -87.85006 -87.85005 -87.85005 -87.85004 -87.85004 -87.85004 [197] -87.85004 -87.85003 -87.85003 -87.85003 -87.85002 -87.85003 -87.84994 [204] -87.84994 -87.84995 -87.84995 -87.84995 -87.84995 -87.84996 -87.84996 [211] -87.84996 -87.84996 -87.84996 -87.84996 -87.84996 -87.84996 -87.84996 [218] -87.84996 -87.84996 -87.84996 -87.84996 -87.84990 -87.84991 -87.84993 [225] -87.84995 -87.84998 -87.84999 -87.85001 -87.85003 -87.84996 -87.84998 [232] -87.84997 -87.84998 -87.84989 -87.84990 -87.84989 -87.84989 -87.84988 [239] -87.84988 -87.84988 -87.84988 -87.84988 -87.84987 -87.84987 -87.84987 [246] -87.84987 -87.84978 -87.84978 -87.84979 -87.84979 -87.84979 -87.84979 [253] -87.84979 -87.84980 -87.84980 -87.84981 -87.84980 -87.84981 -87.84981 [260] -87.84973 -87.84973 -87.84973 -87.84972 -87.84972 -87.84972 -87.84971 [267] -87.84971 -87.84971 -87.84970 -87.84970 -87.84970 -87.84963 -87.84963 [274] -87.84963 -87.84963 -87.84963 -87.84964 -87.84964 -87.84965 -87.84964 [281] -87.84964 -87.84965 -87.84957 -87.84957 -87.84956 -87.84956 -87.84958 [288] -87.84958 y [1] 33.90342 33.90335 33.90328 33.90321 33.90314 33.90308 33.90301 33.90294 [9] 33.90287 33.90280 33.90274 33.90267 33.90260 33.90253 33.90246 33.90240 [17] 33.90233 33.90232 33.90239 33.90245 33.90252 33.90259 33.90266 33.90273 [25] 33.90279 33.90286 33.90293 33.90300 33.90307 33.90314 33.90321 33.90327 [33] 33.90334 33.90339 33.90337 33.90335 33.90328 33.90321 33.90319 33.90318 [41] 33.90317 33.90316 33.90315 33.90313 33.90312 33.90310 33.90309 33.90307 [49] 33.90314 33.90314 33.90314 33.90314 33.90314 33.90314 33.90314 33.90314 [57] 33.90300 33.90294 33.90287 33.90280 33.90273 33.90266 33.90252 33.90245 [65] 33.90239 33.90232 33.90231 33.90237 33.90245 33.90251 33.90258 33.90259 [73] 33.90265 33.90272 33.90279 33.90286 33.90292 33.90299 33.90306 33.90313 [81] 33.90320 33.90326 33.90334 33.90332 33.90327 33.90320 33.90314 33.90307 [89] 33.90300 33.90293 33.90286 33.90279 33.90272 33.90265 33.90258 33.90252 [97] 33.90245 33.90238 33.90231 33.90231 33.90237 33.90243 33.90250 33.90257 [105] 33.90264 33.90271 33.90278 33.90285 33.90292 33.90298 33.90306 33.90312 [113] 33.90319 33.90326 33.90329 33.90326 33.90319 33.90312 33.90306 33.90299 [121] 33.90292 33.90286 33.90279 33.90272 33.90265 33.90258 33.90251 33.90245 [129] 33.90237 33.90231 33.90230 33.90236 33.90243 33.90250 33.90257 33.90264 [137] 33.90271 33.90277 33.90284 33.90291 33.90298 33.90305 33.90311 33.90319 [145] 33.90325 33.90323 33.90319 33.90312 33.90305 33.90299 33.90291 33.90285 [153] 33.90278 33.90272 33.90264 33.90257 33.90250 33.90243 33.90237 33.90230 [161] 33.90229 33.90235 33.90243 33.90250 33.90256 33.90263 33.90270 33.90277 [169] 33.90283 33.90290 33.90297 33.90304 33.90311
Re: [R] grouping data
Hi Steve, Just test whether y is greater than the predicted y (i.e., your line). ## function using the model coefficients* f - function(x) {82.9996 + (.5589 * x)} ## Find group membership group - ifelse(y foo(x), A, B) *Note that depending how accurate this needs to be, you will probably want to use the model itself rather than just reading from the printout like I did. If you need to do that, take a look at ?predict For future reference, it would be easier for readers if you provided your data via something like: dput(x) that can be copied directly into the R console. Also, if you are generating random data (rnorm()), you can use set.seed() so that we can replicate exactly what you get. HTH, Josh On Fri, Mar 4, 2011 at 1:39 PM, Steve Hong empti...@gmail.com wrote: Hi R-list, I have a data set with plot locations and observations and want to label them based on locations. For example, I have GPS information (x and y) as follows: [snip] (fm1 - lm(ysim~xsim)) Call: lm(formula = ysim ~ xsim) Coefficients: (Intercept) xsim 82.9996 0.5589 I overlapped fitted line on the plot. abline(fm1) My question is: As you can see in the plot, how can I label (or re-group) those in upper diagonal as (say) 'A' and the others in lower diagonal as 'B'? Thanks a lot in advance!!! Steve [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping and counting in dataframe
have nobody any idea? i have already try with tapply(d,gr, ... ) but i have problems with the choose of the function ... also i am not really sure if that is the right direction with tapply ... it'll be really great when somebody comes with new suggestion.. 10x -- View this message in context: http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3327240.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping and counting in dataframe
Here is one solution; mine differs since there should be at least one item in the range which would be itself: tm gr 1 12345 1 2 42352 3 3 12435 1 4 67546 2 5 24234 2 6 76543 4 7 31243 2 8 13334 3 9 64562 3 10 64123 3 d$ct - ave(d$tm, d$gr, FUN = function(x){ + # determine count in the range + sapply(x, function(a) sum((x = a - 500) (x = a + 500))) + }) d tm gr ct 1 12345 1 2 2 42352 3 1 3 12435 1 2 4 67546 2 1 5 24234 2 1 6 76543 4 1 7 31243 2 1 8 13334 3 1 9 64562 3 2 10 64123 3 2 On Sat, Feb 26, 2011 at 5:10 PM, zem zmanol...@gmail.com wrote: sry, new try: tm-c(12345,42352,12435,67546,24234,76543,31243,13334,64562,64123) gr-c(1,3,1,2,2,4,2,3,3,3) d-data.frame(cbind(time,gr)) where tm are unix times and gr the factor grouping by i have a skalar for example k=500 now i need to calculate in for every row how much examples in the group are in the interval [i-500;i+500] and i is the active tm-element, like this: d time gr ct 1 12345 1 2 2 42352 3 0 3 12435 1 2 4 67546 2 0 5 24234 2 0 6 76543 4 0 7 31243 2 0 8 13334 3 0 9 64562 3 2 10 64123 3 2 i hope that was a better illustration of my problem -- View this message in context: http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3326338.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping and counting in dataframe
sry, new try: tm-c(12345,42352,12435,67546,24234,76543,31243,13334,64562,64123) gr-c(1,3,1,2,2,4,2,3,3,3) d-data.frame(cbind(time,gr)) where tm are unix times and gr the factor grouping by i have a skalar for example k=500 now i need to calculate in for every row how much examples in the group are in the interval [i-500;i+500] and i is the active tm-element, like this: d time gr ct 1 12345 1 2 2 42352 3 0 3 12435 1 2 4 67546 2 0 5 24234 2 0 6 76543 4 0 7 31243 2 0 8 13334 3 0 9 64562 3 2 10 64123 3 2 i hope that was a better illustration of my problem -- View this message in context: http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3326338.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping and counting in dataframe
hi all, i have a little problem, i have some code writen, but it is to slow... i have a dataframe with a column of time series and a grouping column, really there is no metter if in the first col what kind of data is, it can be a random number like this x-rnorm(10) gr-c(1,3,1,2,2,4,2,3,3,3) x-cbind(x,gr) now i have to look for every row i , for this group, how much from the x[,1] is in a range from x[1,i] such x[1,i] (+/-) k (k is a another number) thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3325476.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping and counting in dataframe
On Feb 25, 2011, at 8:28 PM, zem wrote: hi all, i have a little problem, i have some code writen, but it is to slow... i have a dataframe with a column of time series and a grouping column, really there is no metter if in the first col what kind of data is, it can be a random number like this x-rnorm(10) gr-c(1,3,1,2,2,4,2,3,3,3) x-cbind(x,gr) That is not a dataframe. It is a matrix. And not all time series objects are the same, so you should not assume that any old two column object will respond the same way to R functions. now i have to look for every row i , for this group, how much from the x[,1] is in a range from x[1,i] such x[1,i] (+/-) k (k is a another number) You may find that the function, findInterval, is useful. I cannot determine what you goal is from the description and there is no complete example with a specification of what correct output would be as you should have seen requested in the Posting Guide. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping by factors in R
I'm having a hard time figuring out how to group results by certain factors in R. I have data with the following headings: [1] Time Plot LatCatElevation ElevCat AspectAspCatSlope [9] SlopeCat Species SizeClass Stems and I'm trying to use a GLM to test differences in Stems for different categories/factors - most importantly, I want to group things so that I see results by SizeClass and then by Species. This is pretty easy in SAS using the Group By command, but in R, I haven't figured it out. I've tried using the following code: stems139GLM - glm(Stems ~ Time | SizeClass | Species, family=poisson, data=stems139) but R gives me this message: Error in pmax(exp(eta), .Machine$double.eps) : cannot mix 0-length vectors with others In addition: Warning messages: 1: In Ops.factor(Time, SizeClass) : | not meaningful for factors 2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for factors I'd appreciate any help. Thanks. -- Christopher R. Dolanc PhD Candidate Ecology Graduate Group University of California, Davis Lab Phone: (530) 752-2644 (Barbour lab)un __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by factors in R
Hi: One approach would be to use dlply() from the plyr package to generate the models and assign the results to a list, something like the following: library(plyr) # function to run the GLM in each data subset - the argument is a generic data subset d gfun - function(d) glm(Stems ~ Time, data = d, family = poisson) mlist - dlply(stems139, .(SizeClass, Species), gfun) To see the result, try mlist[[1]] or summary(mlist[[1]]) to execute the print and summary methods on the first fitted model. Each output list object from glm() is a list component of mlist, so mlist is actually a list of lists. You can extract various pieces from mlist by using ldply() with a suitable extraction function or by use of the do.call/lapply combination. All of this is untested since no minimal example was provided per instructions in the Posting Guide... HTH, Dennis On Tue, Feb 8, 2011 at 11:54 AM, Christopher R. Dolanc crdol...@ucdavis.edu wrote: I'm having a hard time figuring out how to group results by certain factors in R. I have data with the following headings: [1] Time Plot LatCatElevation ElevCat Aspect AspCatSlope [9] SlopeCat Species SizeClass Stems and I'm trying to use a GLM to test differences in Stems for different categories/factors - most importantly, I want to group things so that I see results by SizeClass and then by Species. This is pretty easy in SAS using the Group By command, but in R, I haven't figured it out. I've tried using the following code: stems139GLM - glm(Stems ~ Time | SizeClass | Species, family=poisson, data=stems139) but R gives me this message: Error in pmax(exp(eta), .Machine$double.eps) : cannot mix 0-length vectors with others In addition: Warning messages: 1: In Ops.factor(Time, SizeClass) : | not meaningful for factors 2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for factors I'd appreciate any help. Thanks. -- Christopher R. Dolanc PhD Candidate Ecology Graduate Group University of California, Davis Lab Phone: (530) 752-2644 (Barbour lab)un __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by factors in R
I'm working on getting this to work - need to figure out how to extract pieces properly. In the mean time, I may have figured out an alternate method to group the factors by the following: stems139$SpeciesF - factor(stems139$Species) stems139GLM - glm(Stems ~ Time*SizeClassF*Species, family=poisson, data=stems139) summary(stems139GLM) Call: glm(formula = Stems ~ Time * SizeClassF * Species, family = poisson, data = stems139) Deviance Residuals: Min1QMedian3QMax -4.2308-1.0107-0.6786-0.339316.7415 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)-0.6717940.118678-5.661 1.51e-08 *** TimeVTM-0.5738000.197698-2.902 0.003703 ** SizeClassF2-0.7661720.210684-3.637 0.000276 *** SizeClassF3-1.960095 0.337764-5.803 6.51e-09 *** SizeClassF4-2.6532420.462693-5.734 9.79e-09 *** SpeciesABMA1.8240950.12789514.262 2e-16 *** SpeciesJUOC-0.0882930.171666-0.514 0.607022 SpeciesPIAL1.9479200.12685615.355 2e-16 *** SpeciesPICO2.8634070.12201823.467 2e-16 *** SpeciesPIJE-0.5250100.194664-2.697 0.006997 ** SpeciesPIMO0.3720490.1542512.412 0.015866 * SpeciesTSME1.9194050.12708515.103 2e-16 *** TimeVTM:SizeClassF2-0.6201220.411567-1.507 0.131879 TimeVTM:SizeClassF30.7561220.4716121.603 0.108875 TimeVTM:SizeClassF40.9102730.6180141.473 0.140778 The problem now though, is that R for some reason does not list factor 1 in the output. Why would this be? On 2/8/2011 2:21 PM, Dennis Murphy wrote: Hi: One approach would be to use dlply() from the plyr package to generate the models and assign the results to a list, something like the following: library(plyr) # function to run the GLM in each data subset - the argument is a generic data subset d gfun - function(d) glm(Stems ~ Time, data = d, family = poisson) mlist - dlply(stems139, .(SizeClass, Species), gfun) To see the result, try mlist[[1]] or summary(mlist[[1]]) to execute the print and summary methods on the first fitted model. Each output list object from glm() is a list component of mlist, so mlist is actually a list of lists. You can extract various pieces from mlist by using ldply() with a suitable extraction function or by use of the do.call/lapply combination. All of this is untested since no minimal example was provided per instructions in the Posting Guide... HTH, Dennis On Tue, Feb 8, 2011 at 11:54 AM, Christopher R. Dolanc crdol...@ucdavis.edu mailto:crdol...@ucdavis.edu wrote: I'm having a hard time figuring out how to group results by certain factors in R. I have data with the following headings: [1] Time Plot LatCatElevation ElevCat AspectAspCatSlope [9] SlopeCat Species SizeClass Stems and I'm trying to use a GLM to test differences in Stems for different categories/factors - most importantly, I want to group things so that I see results by SizeClass and then by Species. This is pretty easy in SAS using the Group By command, but in R, I haven't figured it out. I've tried using the following code: stems139GLM - glm(Stems ~ Time | SizeClass | Species, family=poisson, data=stems139) but R gives me this message: Error in pmax(exp(eta), .Machine$double.eps) : cannot mix 0-length vectors with others In addition: Warning messages: 1: In Ops.factor(Time, SizeClass) : | not meaningful for factors 2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for factors I'd appreciate any help. Thanks. -- Christopher R. Dolanc PhD Candidate Ecology Graduate Group University of California, Davis Lab Phone: (530) 752-2644 (Barbour lab)un __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Christopher R. Dolanc PhD Candidate Ecology Graduate Group University of California, Davis Lab Phone: (530) 752-2644 (Barbour lab) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grouping question
Hello I have what is probably a very simple grouping question however, given my limited exposure to R, I have not found a solution yet despite my research efforts and wild attempts at what I thought might produce some sort of result. I have a very simple list of integers that range between 1 and 24. These correspond to hours of the day. I am trying to create a grouping of Day and Night with Day = 6 to 17.99 Night = 1 to 5.59 and 18 to 24 Using the Cut command I can create the segments but I have not found a combine type of command to merger the two night segments. No luck with if/else either. Any help would be greatly appreciated Thank you Will -- View this message in context: http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping question
try this: x [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 y - cut(x, breaks=c(-Inf,6,18, Inf), labels=c('a','b','c')) levels(y) - c('night','day','night') y [1] night night night night night night night day day day day day day day day day day day [19] day night night night night night night Levels: night day On Fri, Oct 29, 2010 at 8:56 PM, will phillips will.phill...@q.com wrote: Hello I have what is probably a very simple grouping question however, given my limited exposure to R, I have not found a solution yet despite my research efforts and wild attempts at what I thought might produce some sort of result. I have a very simple list of integers that range between 1 and 24. These correspond to hours of the day. I am trying to create a grouping of Day and Night with Day = 6 to 17.99 Night = 1 to 5.59 and 18 to 24 Using the Cut command I can create the segments but I have not found a combine type of command to merger the two night segments. No luck with if/else either. Any help would be greatly appreciated Thank you Will -- View this message in context: http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping question
Hi Will, One way would be: x [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 factor(ifelse(x6 x18, 'day', 'night')) [1] night night night night night night night day day day day day day day day [16] day day day night night night night night night night Levels: day night HTH, Jorge On Fri, Oct 29, 2010 at 8:56 PM, will phillips wrote: Hello I have what is probably a very simple grouping question however, given my limited exposure to R, I have not found a solution yet despite my research efforts and wild attempts at what I thought might produce some sort of result. I have a very simple list of integers that range between 1 and 24. These correspond to hours of the day. I am trying to create a grouping of Day and Night with Day = 6 to 17.99 Night = 1 to 5.59 and 18 to 24 Using the Cut command I can create the segments but I have not found a combine type of command to merger the two night segments. No luck with if/else either. Any help would be greatly appreciated Thank you Will -- View this message in context: http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping question
Hello Jim Wow. I tried cut but i see you have an interim step with labels a,b,c but levels night and day. i was really close to this. i have labels night,day,night and it wouldn't let me duplicate labels. I am very greatful for your input Will -- View this message in context: http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019950.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping question
Hello Jorge, Thank you for the reply. I tried a few different things with if/else but couldn't get them to go. I really appreciate your feedback. I learned something new from this Will -- View this message in context: http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019952.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.