Re: [R] Problems with closing R
It can seem like it is hung when you default to saving your workspace on close and you have a very large workspace in memory relative to your hard drive write speed. Are you sure it isn't that? On Sun, Jul 7, 2019 at 1:18 PM Spencer Brackett < spbracket...@saintjosephhs.com> wrote: > Thank you. I will try upgrading and see if that solves the problem > > On Sun, Jul 7, 2019 at 4:08 PM Jeff Newmiller > wrote: > > > A) You ask whether uninstalling RStudio will delete files... I don't > think > > so but this is not the support area for RStudio. > > > > B) R will not delete your data files when uninstalled. > > > > C) I suspect that reinstalling software is unlikely to repair the > symptoms > > you describe (sounds like buggy software to me). Simply restarting > RStudio > > and plowing on would be about as effective but less work. However there > > have been mentions on this list of bugs in RStudio related to > > incompatibility with R 3.6, [1] which might be related to your problems > so > > upgrading RStudio to beta or downgrading R to 3.5.3 may make a > difference. > > > > [1] https://stat.ethz.ch/pipermail/r-help/2019-July/463226.html > > > > On July 7, 2019 12:28:15 PM PDT, Spencer Brackett < > > spbracket...@saintjosephhs.com> wrote: > > >Hello, > > > > > > I am trying to quit a current session on RStudio and the “quitting > > >session” prompt from R has just continued to load. I assume that R is > > >not > > >responding for some reason. If the problem persists, and I were to > > >uninstall and then reinstall R, would my saved .RData and other R files > > >and > > >environments saved on my desktop be deleted? > > > > > >Not sure if any other solutions to this issue. > > > > > >Best, > > > > > >Spencer > > > > > > [[alternative HTML version deleted]] > > > > > >__ > > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >https://stat.ethz.ch/mailman/listinfo/r-help > > >PLEASE do read the posting guide > > >http://www.R-project.org/posting-guide.html > > >and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Sent from my phone. Please excuse my brevity. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference between ifelse and if...else?
ifelse is vectorized. On Wed, Dec 13, 2017 at 7:31 AM, Jinsong Zhaowrote: > Hi there, > > I don't know why the following codes are return different results. > > > ifelse(3 > 2, 1:3, length(1:3)) > [1] 1 > > if (3 > 2) 1:3 else length(1:3) > [1] 1 2 3 > > Any hints? > > Best, > Jinsong > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] JSON data in data frame
I TAd a course in R computing and the first thing I told students was "inspect. inspect. inspect." d1 <- fromJSON(' http://api.openweathermap.org/data/2.5/group?id=524901,703448,2643743=metric=ec0313a918fa729d4372555ada5fb1f8 ') names(d1) str(d1) d1 d1$list your_data = d1$list On Fri, Jan 13, 2017 at 1:12 AM, Archit Soniwrote: > Hi All, > > Warm greetings, I am stuck at an issue to convert incoming json response to > data frame. > > I am using below code to get the data > > library(jsonlite) > d1 <- fromJSON(' > http://api.openweathermap.org/data/2.5/group?id=524901, > 703448,2643743=metric=ec0313a918fa729d4372555ada5fb1f8 > ') > > d2 <- as.data.frame(d1) > > typeof(d2) > list > > can you please guide me how can i get this data into pure data.frame > format. The list in d1 has nested data.frame objects. > > Note: If you are unable to get data from api then can use below json string > to test it out: > > JSON: {"cnt":3,"list":[{"coord":{"lon":37.62,"lat":55.75},"sys": > {"type":1,"id":7323,"message":0.193,"country":"RU","sunrise" > :1484286631,"sunset":1484313983},"weather":[{"id":600,"main":"Snow"," > description":"light > snow","icon":"13d"}],"main":{"temp":-3.75,"pressure":1005," > humidity":86,"temp_min":-4,"temp_max":-3},"visibility": > 8000,"wind":{"speed":4,"deg":170},"clouds":{"all":90},"dt": > 1484290800,"id":524901,"name":"Moscow"},{"coord":{"lon":30. > 52,"lat":50.43},"sys":{"type":1,"id":7358,"message":0.1885," > country":"UA","sunrise":1484286787,"sunset":1484317236},"weather":[{"id": > 804,"main":"Clouds","description":"overcast > clouds","icon":"04d"}],"main":{"temp":-2,"pressure":1009," > humidity":92,"temp_min":-2,"temp_max":-2},"visibility": > 9000,"wind":{"speed":4,"deg":250,"var_beg":210,"var_end": > 270},"clouds":{"all":90},"dt":1484290800,"id":703448,"name": > "Kiev"},{"coord":{"lon":-0.13,"lat":51.51},"sys":{"type":1," > id":5187,"message":0.1973,"country":"GB","sunrise":1484294413,"sunset": > 1484324321},"weather":[{"id":802,"main":"Clouds","description":"scattered > clouds","icon":"03n"}],"main":{"temp":0.7,"pressure":1002," > temp_min":0,"temp_max":2,"humidity":98},"visibility": > 1,"wind":{"speed":6.2,"deg":270},"clouds":{"all":40}, > "dt":1484290200,"id":2643743,"name":"London"}]} > > Any help is appreciated. > > -- > Regards > Archit > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About populating a dataframe in a loop
As a rule never rbind in a loop. It has O(n^2) run time because the rbind itself can be O(n) (where n is the number of data.frames). Instead either put them all into a list with lapply or vector("list", length=) and then datatable::rbindlist, do.call(rbind, thelist) or use the equivalent from dplyr. All of which will be much more efficient. On Fri, Jan 6, 2017 at 8:46 PM, lily liwrote: > Hi Rui, > > Thanks for your reply. Yes, when I tried to rbind two dataframes, it works. > However, if there are more than 50, it got stuck for hours. When I tried to > terminate the process and open the csv file separately, it has only one > data frame. What is the problem? Thanks. > > > On Fri, Jan 6, 2017 at 11:12 AM, Rui Barradas > wrote: > > > Hello, > > > > Works with me: > > > > set.seed(6574) > > > > pre.mat = data.frame() > > for(i in 1:10){ > > mat.temp = data.frame(x = rnorm(5), A = sample(LETTERS, 5, TRUE)) > > pre.mat = rbind(pre.mat, mat.temp) > > } > > > > nrow(pre.mat) # should be 50 > > > > > > Can you give us an example that doesn't work? > > > > Rui Barradas > > > > > > Em 06-01-2017 18:00, lily li escreveu: > > > >> Hi R users, > >> > >> I have a question about filling a dataframe in R using a for loop. > >> > >> I created an empty dataframe first and then filled it, using the code: > >> pre.mat = data.frame() > >> for(i in 1:10){ > >> mat.temp = data.frame(some values filled in) > >> pre.mat = rbind(pre.mat, mat.temp) > >> } > >> However, the resulted dataframe has not all the rows that I desired for. > >> What is the problem and how to solve it? Thanks. > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posti > >> ng-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique dates per ID
library(data.table) setDT(df) setkeyv(df, c("Subject", "dates")) unique(df) #gets what you want. On Mon, Nov 14, 2016 at 11:38 PM, Jim Lemonwrote: > Hi Farnoosh, > Try this: > > for(id in unique(df$Subject)) { > whichsub<-df$Subject==id > if(exists("newdf")) > newdf<-rbind(newdf,df[whichsub,][which(!duplicated( > df$dates[whichsub])),]) > else newdf<-df[whichsub,][which(!duplicated(df$dates[whichsub])),] > } > > Jim > > > On Tue, Nov 15, 2016 at 9:38 AM, Farnoosh Sheikhi via R-help > wrote: > > Hi, > > I have a data set like below: > > Subject<- c("2", "2", "2", "3", "3", "3", "4", "4", "5", "5", "5", > "5")dates<-c("2011-01-01", "2011-01-01", "2011-01-03" ,"2011-01-04", > "2011-01-05", "2011-01-06" ,"2011-01-07", "2011-01-07", "2011-01-09" > ,"2011-01-10" ,"2011-01-11" ,"2011-01-11")deps<-c("A", "B", "CC", > "C", "CC", "A", "F", "DD", "A", "F", "FF", "D")df <- data.frame(Subject, > dates, deps); df > > I want to choose unique dates per ID in a way there are not duplicate > dates per ID. I don't mind what department to pick. I really appreciate any > help. Best,Farnoosh > > > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function argument and scope
Hi, Didn't bother to run the code because someone else said it might do what you intended, and also your problem description was complete unto itself. The issue is that R copies on change. You are thinking like you have a reference, which you do not. That is not very R like in style, but it certainly can be accomplished if you want via change of input class (See new.env()). A typical R style would be to make the modifications to the input argument, return it, and then assign it back to the input object. e.g. test = myFunction(test) If you really have some reason to want to change the data.frame in a function without re-assigning it then check out data.table, which has that as a side effect of how it operates. Thanks, On Sun, Nov 13, 2016 at 2:09 PM, Bernardo Doréwrote: > Hello list, > > my first post but I've been using this list as a help source for a while. > Couldn't live without it. > > I am writing a function that takes a dataframe as an argument and in the > end I intend to assign the result of some computation back to the > dataframe. This is what I have so far: > > myFunction <- function(x){ > y <- x[1,1] > z <- strsplit(as.character(y), split = " ") > if(length(z[[1]] > 1)){ > predictedWord <- z[[1]][length(z[[1]])] > z <- z[[1]][-c(length(z[[1]]))] > z <- paste(z, collapse = " ") > } > x[1,1] <- z > } > > And lets say I create my dataframe like this: > test <- data.frame(var1=c("a","b","c"),var2=c("d","e","f")) > > and then call > myFunction(test) > > The problem is when I assign x[1,1] to y in the first operation inside the > function, x becomes a dataframe inside the function scope and loses the > reference to the dataframe "test" passed as argument. In the end when I > assign z to what should be row 1 and column 1 of the "test" dataframe, it > assigns to x inside the function scope and no modification is made on > "test". > > I hope the problem statement is clear. > > Thank you, > > Bernardo Doré > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Putting a bunch of Excel files as data.frames into a list fails
Try changing: v_list_of_files[v_file] to: v_list_of_files[[v_file]] Also are you sure you are not generating warnings? For example, l = list() l["iris"] = iris; Also, you can change it to lapply(v_files, function(v_file){...}) Have a good one, Jeremiah On Wed, Sep 28, 2016 at 8:02 AM,wrote: > Hi All, > > I need to read a bunch of Excel files and store them in R. > > I decided to store the different Excel files in data.frames in a named > list where the names are the file names of each file (and that is > different from the sources as far as I can see): > > -- cut -- > # Sources: > # - > http://stackoverflow.com/questions/11433432/importing- > multiple-csv-files-into-r > # - > http://stackoverflow.com/questions/9564489/opening-all- > files-in-a-folder-and-applying-a-function > # - > http://stackoverflow.com/questions/12945687/how-to- > read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frame-e > > v_file_path <- "H:/2016/Analysen/Neukunden/Input" > v_file_pattern <- "*.xlsx" > > v_files <- list.files(path = v_file_path, > pattern = v_file_pattern, > ignore.case = TRUE) > print(v_files) > > v_list_of_files <- list() > > for (v_file in v_files) { > v_list_of_files[v_file] <- openxlsx::read.xlsx( > file.path(v_file_path, > v_file)) > } > > This code does not work cause it stores only the first variable of each > Excel file in a named list. > > What do I need to change to get it running? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why data.frame, mutate package and not lists
There is also this syntax for adding variables df[, "var5"] = 1:10 and the syntax sugar for row-oriented storage: df[1:5,] On Wed, Sep 14, 2016 at 11:40 AM, jeremiah rounds <roundsjerem...@gmail.com> wrote: > "If you want to add variable to data.frame you have to use attach, detach. > Right?" > > Not quite. Use it like a list to add a variable to a data.frame > > e.g. > df = list() > df$var1 = 1:10 > df = as.data.frame(df) > df$var2 = 1:10 > df[["var3"]] = 1:10 > df > df = as.list(df) > df$var4 = 1:10 > as.data.frame(df) > > Ironically the primary reason to use a data.frame in my head is to signal > that you are thinking of your data as a row-oriented tabular storage. > "Ironic" because in technical detail that is not a requirement to be a > data.frame, but when I reflect on the typical way a seasoned R programmer > approaches list and data.frames that is basically what they are > communicating. > > I was going to post that a reason to use data.frames is to take advantages > of optimizations and syntax sugar for data.frames, but in reality if code > does not assume a row-oriented data structure in a data.frame there is not > much I can think of that exists in the way of optimization. For example, > we could point to "subset" and say that is a reason to use data.frames and > not list, but that only works if you use data.frame in a conventional way. > > In the end, my advice to you is if it is a table make it a data.frame and > if it is not easily thought of as a table or row-oriented data structure > keep it as a list. > > Thanks, > Jeremiah > > > > > > On Wed, Sep 14, 2016 at 11:15 AM, Alaios via R-help <r-help@r-project.org> > wrote: > >> thanks for all the answers. I think also ggplot2 requires data.frames.If >> you want to add variable to data.frame you have to use attach, detach. >> Right?Any more links that discuss thoe two different approaches?Alex >> >> On Wednesday, September 14, 2016 5:34 PM, Bert Gunter < >> bgunter.4...@gmail.com> wrote: >> >> >> This is partially a matter of subjectve opinion, and so pointless; but >> I would point out that data frames are the canonical structure for a >> great many of R's modeling and graphics functions, e.g. lm, xyplot, >> etc. >> >> As for mutate() etc., that's about UI's and user friendliness, and >> imho my ho is meaningless. >> >> Best, >> Bert >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Wed, Sep 14, 2016 at 6:01 AM, Alaios via R-help <r-help@r-project.org> >> wrote: >> > Hi all,I have seen data.frames and operations from the mutate package >> getting really popular. In the last years I have been using extensively >> lists, is there any reason to not use lists and use other data types for >> data manipulation and storage? >> > Any article that describe their differences? I would like to thank you >> for your replyRegardsAlex >> >[[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why data.frame, mutate package and not lists
"If you want to add variable to data.frame you have to use attach, detach. Right?" Not quite. Use it like a list to add a variable to a data.frame e.g. df = list() df$var1 = 1:10 df = as.data.frame(df) df$var2 = 1:10 df[["var3"]] = 1:10 df df = as.list(df) df$var4 = 1:10 as.data.frame(df) Ironically the primary reason to use a data.frame in my head is to signal that you are thinking of your data as a row-oriented tabular storage. "Ironic" because in technical detail that is not a requirement to be a data.frame, but when I reflect on the typical way a seasoned R programmer approaches list and data.frames that is basically what they are communicating. I was going to post that a reason to use data.frames is to take advantages of optimizations and syntax sugar for data.frames, but in reality if code does not assume a row-oriented data structure in a data.frame there is not much I can think of that exists in the way of optimization. For example, we could point to "subset" and say that is a reason to use data.frames and not list, but that only works if you use data.frame in a conventional way. In the end, my advice to you is if it is a table make it a data.frame and if it is not easily thought of as a table or row-oriented data structure keep it as a list. Thanks, Jeremiah On Wed, Sep 14, 2016 at 11:15 AM, Alaios via R-helpwrote: > thanks for all the answers. I think also ggplot2 requires data.frames.If > you want to add variable to data.frame you have to use attach, detach. > Right?Any more links that discuss thoe two different approaches?Alex > > On Wednesday, September 14, 2016 5:34 PM, Bert Gunter < > bgunter.4...@gmail.com> wrote: > > > This is partially a matter of subjectve opinion, and so pointless; but > I would point out that data frames are the canonical structure for a > great many of R's modeling and graphics functions, e.g. lm, xyplot, > etc. > > As for mutate() etc., that's about UI's and user friendliness, and > imho my ho is meaningless. > > Best, > Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Sep 14, 2016 at 6:01 AM, Alaios via R-help > wrote: > > Hi all,I have seen data.frames and operations from the mutate package > getting really popular. In the last years I have been using extensively > lists, is there any reason to not use lists and use other data types for > data manipulation and storage? > > Any article that describe their differences? I would like to thank you > for your replyRegardsAlex > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with strftime error "character string is not in a standard unambiguous format"
Not sure what the issue is with the provided code but note: library(lubridate) lubridate::dmy_hm("Thu, 25 Aug 2016 6:34 PM") [1] "2016-08-25 18:34:00 UTC" Though if you go that route: set the TZ because on the timestamp it is ambiguous. On Sun, Sep 11, 2016 at 10:57 PM, Chris Evanswrote: > I am trying to read activity data created by Garmin. It outputs dates like > this: > > "Thu, 25 Aug 2016 6:34 PM" > > The problem that has stumped me is this: > > > strftime("Thu, 25 Aug 2016 6:34 PM",format="%a, %d %b %Y %I:%M %p") > Error in as.POSIXlt.character(x, tz = tz) : > character string is not in a standard unambiguous format > > I _thought_ I had this running OK but that error is catching me now. I > think I've read ?strftime and written that format string correctly to match > the input but I'm stumped now. > > Can someone advise me? Many thanks in advance, > > Chris > > > > sessionInfo() > R version 3.3.1 (2016-06-21) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 10586) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 > [2] LC_CTYPE=English_United Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.3.1 tools_3.3.1 > > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time format lagging issue
Building on Don's example here is something that looks a lot like what I do every day: Sys.setenv(TZ="UTC") mydf <- data.frame(t1=c('2011-12-31-22-30', '2011-12-31-23-30')) library(lubridate) mydf$timestamp = lubridate::ymd_hm(mydf$t1) mydf$t2 = mydf$timestamp - period(minute=30) On Wed, Aug 31, 2016 at 2:44 PM, MacQueen, Donwrote: > Try following this example: > > mydf <- data.frame(t1=c('201112312230', '201112312330')) > tmp1 <- as.POSIXct(mydf$t1, format='%Y%m%d%H%M') > tmp2 <- tmp1 - 30*60 > mydf$t2 <- format(tmp2, '%Y%m%d%H%M') > > It can be made into a single line, but I used intermediate variables tmp1 > and tmp2 so that it would be easier to follow. > > Base R is more than adequate for this task. > > Please get rid of the asterisks in your next email. The just get in the > way. Learn how to send plain text email, not HTML email. Please. > > > > > -- > Don MacQueen > > Lawrence Livermore National Laboratory > 7000 East Ave., L-627 > Livermore, CA 94550 > 925-423-1062 > > > > > > On 8/31/16, 9:07 AM, "R-help on behalf of Bhaskar Mitra" > > wrote: > > >Hello Everyone, > > > >I am trying a shift the time series in a dataframe (df) by 30 minutes . My > >current format looks something like this : > > > > > > > >*df$$Time 1* > > > > > >*201112312230* > > > >*201112312300* > > > >*201112312330* > > > > > > > >*I am trying to add an additional column of time (df$Time 2) next to Time > >1 by lagging it by 30minutes. Something like this :* > > > > > >*df$Time1 **df$$Time2* > > > > > >*201112312230 **201112312200* > > > >*201112312300 **201112312230* > > > >*201112312330 **201112312300* > > > >*201112312330 * > > > > > > > > > > > >*Based on some of the suggestions available, I have tried this option * > > > > > > > >*require(zoo)* > > > >*df1$Time2 <- lag(df1$Time1, -1, na.pad = TRUE)* > > > >*View(df1)* > > > > > > > >*This does not however give me the desired result. I would appreciate any > >suggestions/advice in this regard.* > > > > > >*Thanks,* > > > >*Bhaskar* > > > > [[alternative HTML version deleted]] > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove rows based on row mean
oh I forgot I renamed sm. dt = sm library(data.table) setDT(dt) op = function(s){ mean0 = apply(s, 1, mean) ret = s[which.max(mean0)] ret$mean = mean0 ret } max_row = dt[, op(.SD), by = "Gene"] Thanks, Jeremiah On Thu, Aug 18, 2016 at 3:21 PM, jeremiah rounds <roundsjerem...@gmail.com> wrote: > library(data.table) > setDT(dt) > op = function(s){ > mean0 = apply(s, 1, mean) > ret = s[which.max(mean0)] > ret$mean = mean0 > ret > } > max_row = dt[, op(.SD), by = "Gene"] > > Thanks, > Jeremiah > > On Thu, Aug 18, 2016 at 2:33 PM, Adrian Johnson <oriolebaltim...@gmail.com > > wrote: > >> Hi Group, >> I have a data matrix sm (dput code given below). >> >> I want to create a data matrix with rows with same variable that have >> higher mean. >> >> > sm >> Gene GSM529305 GSM529306 GSM529307 GSM529308 >> 1A1BG 6.57 6.72 6.83 6.69 >> 2A1CF 2.91 2.80 3.08 3.00 >> 3 A2LD1 5.82 7.01 6.62 6.87 >> 4 A2M 9.21 9.35 9.32 9.19 >> 5 A2M 2.94 2.50 3.16 2.76 >> 6 A4GALT 6.86 5.75 6.06 7.04 >> 7 A4GNT 3.97 3.56 4.22 3.88 >> 8AAA1 3.39 2.90 3.16 3.23 >> 9AAAS 8.26 8.63 8.40 8.70 >> 10 AAAS 6.82 7.15 7.33 6.51 >> >> For example in rows 4 and 5 have same variable Gene A2M. I want to >> select only row that has higher mean. I wrote the following code that >> gives me duplicate rows with higher mean but I cannot properly write >> the result. Could someone help. Thanks >> >> ugns <- unique(sm$Gene) >> >> exwidh = c() >> >> for(i in 1:length(ugns)){ >> k = ugns[i] >> exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decr >> easing=TRUE)[1]),] >> } >> >> >> >> >> >> structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M", >> "A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57, >> 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72, >> 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83, >> 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69, >> 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene", >> "GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA, >> 10L), class = "data.frame") >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove rows based on row mean
library(data.table) setDT(dt) op = function(s){ mean0 = apply(s, 1, mean) ret = s[which.max(mean0)] ret$mean = mean0 ret } max_row = dt[, op(.SD), by = "Gene"] Thanks, Jeremiah On Thu, Aug 18, 2016 at 2:33 PM, Adrian Johnsonwrote: > Hi Group, > I have a data matrix sm (dput code given below). > > I want to create a data matrix with rows with same variable that have > higher mean. > > > sm > Gene GSM529305 GSM529306 GSM529307 GSM529308 > 1A1BG 6.57 6.72 6.83 6.69 > 2A1CF 2.91 2.80 3.08 3.00 > 3 A2LD1 5.82 7.01 6.62 6.87 > 4 A2M 9.21 9.35 9.32 9.19 > 5 A2M 2.94 2.50 3.16 2.76 > 6 A4GALT 6.86 5.75 6.06 7.04 > 7 A4GNT 3.97 3.56 4.22 3.88 > 8AAA1 3.39 2.90 3.16 3.23 > 9AAAS 8.26 8.63 8.40 8.70 > 10 AAAS 6.82 7.15 7.33 6.51 > > For example in rows 4 and 5 have same variable Gene A2M. I want to > select only row that has higher mean. I wrote the following code that > gives me duplicate rows with higher mean but I cannot properly write > the result. Could someone help. Thanks > > ugns <- unique(sm$Gene) > > exwidh = c() > > for(i in 1:length(ugns)){ > k = ugns[i] > exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]), > decreasing=TRUE)[1]),] > } > > > > > > structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M", > "A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57, > 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72, > 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83, > 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69, > 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene", > "GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA, > 10L), class = "data.frame") > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Dummy Var in R for regression?
Something like: d = data.frame(score = sample(1:10, 100, replace=TRUE)) d$score_t = "low" d$score_t[d$score > 3] = "medium" d$score_t[d$score >7 ] = "high" d$score_t = factor(d$score_t, levels = c("low", "medium", "high"), ordered=TRUE) #set ordered = FALSE for dummy variables X = model.matrix(~score_t, data=d) X On Fri, Aug 5, 2016 at 3:23 PM, Shivi Bhatiawrote: > Thanks you all for the assistance. This really helps. > > Hi Bert: While searching nabble i got to know R with factors variables > there is no need to create dummy variable. However please consider this > situation: > I am in the process of building a logistic regression model on NPS data. > The outcome variable is CE i.e. customer experience which has 3 rating so > ordinal logistic regression will be used. However most of my variables are > categorical. For instance one of the variable is agent knowledge which is a > 10 point scale. > > This agent knowledge is again a 3 rated scale: high medium low hence i need > to group these 10 values into 3 groups & then as you suggested i can > directly enter them in the model without creating n-1 categories. > > I have worked on SAS extensively hence found this a bit confusing. > > Thanks for the help. > > On Sat, Aug 6, 2016 at 2:30 AM, Bert Gunter > wrote: > > > Just commenting on the email subject, not the content (which you have > > already been helped with): there is no need to *ever* create a dummy > > variable for regression in R if what you mean by this is what is > > conventionally meant. R will create the model matrix with appropriate > > "dummy variables" for factors as needed. See ?contrasts and ?C for > > relevant details and/or consult an appropriate R tutorial. > > > > Of course, if this is not what you meant, than ignore. > > > > Cheers, > > Bert > > > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > > and sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Fri, Aug 5, 2016 at 1:49 PM, wrote: > > > Hello, > > > > > > Your ifelse will never work because > > > reasons$salutation== "Mr" & reasons$salutation=="Father" is always > FALSE > > > and so is reasons$salutation=="Mrs" & reasons$salutation=="Miss". > > > Try instead | (or), not & (and). > > > > > > Hope this helps, > > > > > > Rui Barradas > > > > > > > > > > > > Citando Shivi Bhatia : > > > > > >> Dear Team, > > >> > > >> I need help with the below code in R: > > >> > > >> gender_rec<- c('Dr','Father','Mr'=1, 'Miss','MS','Mrs'=2, 3) > > >> > > >> reasons$salutation<- gender_rec[reasons$salutation]. > > >> > > >> This code gives me the correct output but it overwrites the > > >> reason$salutation variable. I need to create a new variable gender to > > >> capture gender details and leave salutation as it is. > > >> > > >> i tried the below syntax but it is converting all to 1. > > >> > > >> reasons$gender<- ifelse(reasons$salutation== "Mr" & > reasons$salutation== > > >> "Father","Male", ifelse(reasons$salutation=="Mrs" & > > reasons$salutation== > > >> "Miss","Female",1)) > > >> > > >> Please suggest. > > >> > > >> [[alternative HTML version deleted]] > > >> > > >> __ > > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.htmland provide commented, > > >> minimal, self-contained, reproducible code. > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reduce woes
Basically using Reduce as an lapply in that example, but I think that was caused by how people started talking about things in the first place =) But the point is the accumulator can be anything as far as I can tell. On Thu, Jul 28, 2016 at 12:14 PM, jeremiah rounds <roundsjerem...@gmail.com> wrote: > Re: > "What I'm trying to > work out is how to have the accumulator in Reduce not be the same type as > the elements of the vector/list being reduced - ideally it could be an S3 > instance, list, vector, or data frame." > > Pretty sure that is not true. See code that follows. I would never solve > this task in this way though so no comment on the use of Reduce for what > you described. (Note the accumulation of "functions" in a list is just a > demo of possibilities). You could accumulate in an environment too and > potentially gain a lot of copy efficiency. > > > lookup = list() > lookup[[as.character(1)]] = function() print("1") > lookup[[as.character(2)]] = function() print("2") > lookup[[as.character(3)]] = function() print("3") > > data = list(c(1,2), c(1,4), c(3,3), c(2,30)) > > > r = Reduce(function(acc, item) { > append(acc, list(lookup[[as.character(min(item))]])) > }, data,list()) > r > for(f in r) f() > > > On Thu, Jul 28, 2016 at 5:09 AM, Stefan Kruger <stefan.kru...@gmail.com> > wrote: > >> Ulrik - many thanks for your reply. >> >> I'm aware of many simple solutions as the one you suggest, both iterative >> and functional style - but I'm trying to learn how to bend Reduce() for >> the >> purpose of using it in more complex processing tasks. What I'm trying to >> work out is how to have the accumulator in Reduce not be the same type as >> the elements of the vector/list being reduced - ideally it could be an S3 >> instance, list, vector, or data frame. >> >> Here's a more realistic example (in Elixir, sorry) >> >> Given two lists: >> >> 1. data: maps an id string to a vector of revision strings >> 2. dict: maps known id/revision pairs as a string to true (or 1) >> >> find the items in data not already in dict, returned as a named list. >> >> ```elixir >> data = %{ >> "id1" => ["rev1.1", "rev1.2"], >> "id2" => ["rev2.1"], >> "id3" => ["rev3.1", "rev3.2", "rev3.3"] >> } >> >> dict = %{ >> "id1/rev1.1" => 1, >> "id1/rev1.2" => 1, >> "id3/rev3.1" => 1 >> } >> >> # Find the items in data not already in dict. Return as a grouped map >> >> Map.keys(data) >> |> Enum.flat_map(fn id -> Enum.map(data[id], fn rev -> {id, rev} end) >> end) >> |> Enum.filter(fn {id, rev} -> !Dict.has_key?(dict, "#{id}/#{rev}") >> end) >> |> Enum.reduce(%{}, fn ({k, v}, d) -> Map.update(d, k, [v], &[v|&1]) >> end) >> ``` >> >> >> >> >> On 28 July 2016 at 12:03, Ulrik Stervbo <ulrik.ster...@gmail.com> wrote: >> >> > Hi Stefan, >> > >> > in that case,lapply(data, length) should do the trick. >> > >> > Best wishes, >> > Ulrik >> > >> > On Thu, 28 Jul 2016 at 12:57 Stefan Kruger <stefan.kru...@gmail.com> >> > wrote: >> > >> >> David - many thanks for your response. >> >> >> >> What I tried to do was to turn >> >> >> >> data <- list(one = c(1, 1), three = c(3), two = c(2, 2)) >> >> >> >> into >> >> >> >> result <- list(one = 2, three = 1, two = 2) >> >> >> >> that is creating a new list which has the same names as the first, but >> >> where the values are the vector lengths. >> >> >> >> I know there are many other (and better) trivial ways of achieving >> this - >> >> my aim is less the task itself, and more figuring out if this can be >> done >> >> using Reduce() in the fashion I showed in the other examples I gave. >> It's >> >> a >> >> building block of doing map-filter-reduce type pipelines that I'd like >> to >> >> understand how to do in R. >> >> >> >> Fumbling in the dark, I tried: >> >> >> >> Reduce(function(acc, item) { setNames(c(acc, length(data[item])), item >> }, >> >> names(data), accumulate=TRUE) >> >> >> >> but setNames s
Re: [R] Reduce woes
Re: "What I'm trying to work out is how to have the accumulator in Reduce not be the same type as the elements of the vector/list being reduced - ideally it could be an S3 instance, list, vector, or data frame." Pretty sure that is not true. See code that follows. I would never solve this task in this way though so no comment on the use of Reduce for what you described. (Note the accumulation of "functions" in a list is just a demo of possibilities). You could accumulate in an environment too and potentially gain a lot of copy efficiency. lookup = list() lookup[[as.character(1)]] = function() print("1") lookup[[as.character(2)]] = function() print("2") lookup[[as.character(3)]] = function() print("3") data = list(c(1,2), c(1,4), c(3,3), c(2,30)) r = Reduce(function(acc, item) { append(acc, list(lookup[[as.character(min(item))]])) }, data,list()) r for(f in r) f() On Thu, Jul 28, 2016 at 5:09 AM, Stefan Krugerwrote: > Ulrik - many thanks for your reply. > > I'm aware of many simple solutions as the one you suggest, both iterative > and functional style - but I'm trying to learn how to bend Reduce() for the > purpose of using it in more complex processing tasks. What I'm trying to > work out is how to have the accumulator in Reduce not be the same type as > the elements of the vector/list being reduced - ideally it could be an S3 > instance, list, vector, or data frame. > > Here's a more realistic example (in Elixir, sorry) > > Given two lists: > > 1. data: maps an id string to a vector of revision strings > 2. dict: maps known id/revision pairs as a string to true (or 1) > > find the items in data not already in dict, returned as a named list. > > ```elixir > data = %{ > "id1" => ["rev1.1", "rev1.2"], > "id2" => ["rev2.1"], > "id3" => ["rev3.1", "rev3.2", "rev3.3"] > } > > dict = %{ > "id1/rev1.1" => 1, > "id1/rev1.2" => 1, > "id3/rev3.1" => 1 > } > > # Find the items in data not already in dict. Return as a grouped map > > Map.keys(data) > |> Enum.flat_map(fn id -> Enum.map(data[id], fn rev -> {id, rev} end) > end) > |> Enum.filter(fn {id, rev} -> !Dict.has_key?(dict, "#{id}/#{rev}") > end) > |> Enum.reduce(%{}, fn ({k, v}, d) -> Map.update(d, k, [v], &[v|&1]) > end) > ``` > > > > > On 28 July 2016 at 12:03, Ulrik Stervbo wrote: > > > Hi Stefan, > > > > in that case,lapply(data, length) should do the trick. > > > > Best wishes, > > Ulrik > > > > On Thu, 28 Jul 2016 at 12:57 Stefan Kruger > > wrote: > > > >> David - many thanks for your response. > >> > >> What I tried to do was to turn > >> > >> data <- list(one = c(1, 1), three = c(3), two = c(2, 2)) > >> > >> into > >> > >> result <- list(one = 2, three = 1, two = 2) > >> > >> that is creating a new list which has the same names as the first, but > >> where the values are the vector lengths. > >> > >> I know there are many other (and better) trivial ways of achieving this > - > >> my aim is less the task itself, and more figuring out if this can be > done > >> using Reduce() in the fashion I showed in the other examples I gave. > It's > >> a > >> building block of doing map-filter-reduce type pipelines that I'd like > to > >> understand how to do in R. > >> > >> Fumbling in the dark, I tried: > >> > >> Reduce(function(acc, item) { setNames(c(acc, length(data[item])), item > }, > >> names(data), accumulate=TRUE) > >> > >> but setNames sets all the names, not adding one - and acc is still a > >> vector, not a list. > >> > >> It looks like 'lambda.tools.fold()' and possibly 'purrr.reduce()' aim at > >> doing what I'd like to do - but I've not been able to figure out quite > >> how. > >> > >> Thanks > >> > >> Stefan > >> > >> > >> > >> On 27 July 2016 at 20:35, David Winsemius > wrote: > >> > >> > > >> > > On Jul 27, 2016, at 8:20 AM, Stefan Kruger > > >> > wrote: > >> > > > >> > > Hi - > >> > > > >> > > I'm new to R. > >> > > > >> > > In other functional languages I'm familiar with you can often seed a > >> call > >> > > to reduce() with a custom accumulator. Here's an example in Elixir: > >> > > > >> > > map = %{"one" => [1, 1], "three" => [3], "two" => [2, 2]} > >> > > map |> Enum.reduce(%{}, fn ({k,v}, acc) -> Map.update(acc, k, > >> > > Enum.count(v), nil) end) > >> > > # %{"one" => 2, "three" => 1, "two" => 2} > >> > > > >> > > In R-terms that's reducing a list of vectors to become a new list > >> mapping > >> > > the names to the vector lengths. > >> > > > >> > > Even in JavaScript, you can do similar things: > >> > > > >> > > list = { one: [1, 1], three: [3], two: [2, 2] }; > >> > > var result = Object.keys(list).reduceRight(function (acc, item) { > >> > > acc[item] = list[item].length; > >> > > return acc; > >> > > }, {}); > >> > > // result == { two: 2, three: 1, one: 2 } > >> > > > >> > > In R, from what I can gather, Reduce() is restricted such that any > >> init >
Re: [R] Reducing execution time
Correction to my code. I created a "doc" variable because I was thinking of doing something faster, but I never did the change. grep needed to work on the original source "dat" to be used for counting. Fixed: combs = structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L, 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) dat = list( c(77,65,34,23,55, 65,23,77, 44), c(65,23,77,65,55,34, 77, 34,65, 10), c(77,34,65), c(55,78,56), c(98,23,77,65,34, 65, 23, 77, 34)) words = unlist(apply(combs, 1 , function(d) paste(as.character(d), collapse=" "))) dat = lapply(dat, function(d) paste( as.character(d), collapse= " ")) #doc = paste(dat, collapse = " ## ") # just some arbitrary separator character that isn't in your words counts = sapply(words, function(w) length(grep(w, dat))) names(counts) = words counts cbind(combs, data.frame(N = counts)) On Wed, Jul 27, 2016 at 11:27 AM, sri vathsanwrote: > Hi, > > It is not a just 79 triplets. As I said, there are 79 codes. I am making > triplets out of that 79 codes and matching the triplets in the list. > > Please find the dput of the data below. > > > dput(head(newd,10)) > structure(list(uniq_id = c("1", "2", "3", "4", "5", "6", "7", > "8", "9", "10"), hi = c("11, 22, 84, 85, 108, 111", "18, 84, 85, > 87, 122, 134", > "2, 18, 22", "18, 108, 122, 134, 176", "19, 85, 87, 100, 107", > "79, 85, 111", "11, 88, 108", "19, 88, 96", "19, 85, 96", > "19, 100, 103")), .Names = c("uniq_id", "hi"), row.names = c(NA, > -10L), class = c("tbl_df", "tbl", "data.frame")) > > > > I am trying to count the frequency of the triplets in the above data using > the below code. > > # split column into a list > myList <- strsplit(newd$hi, split=",") > # get all pairwise combinations > myCombos <- t(combn(unique(unlist(myList)), 3)) > # count the instances where the pair is present > myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { > sum(sapply(myList, function(j) { > sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) > #final matrix > final <- cbind(matrix(as.integer(myCombos), nrow(myCombos)), myCounts) > > I hope I made my point clear. Please let me know if I miss anything. > > Regards, > Sri > > > > > On Wed, Jul 27, 2016 at 11:19 PM, Sarah Goslee > wrote: > > > You said you had 79 triplets and 8000 records. > > > > When I compared 100 triplets to 1 records it took 86 seconds. > > > > So obviously there is something you're not telling us about the format > > of your data. > > > > If you use dput() to provide actual examples, you will get better > > results than if we on Rhelp have to guess. Because we tend to guess in > > ways that make the most sense after extensive R experience, and that's > > probably not what you have. > > > > Sarah > > > > On Wed, Jul 27, 2016 at 1:29 PM, sri vathsan > wrote: > > > Hi, > > > > > > Thanks for the solution. But I am afraid that after running this code > > still > > > it takes more time. It has been an hour and still it is executing. I > > > understand the delay because each triplet has to compare almost 9000 > > > elements. > > > > > > Regards, > > > Sri > > > > > > On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee > > > wrote: > > >> > > >> Hi, > > >> > > >> It's really a good idea to use dput() or some other reproducible way > > >> to provide data. I had to guess as to what your data looked like. > > >> > > >> It appears that order doesn't matter? > > >> > > >> Given than, here's one approach: > > >> > > >> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, > > 34L, > > >> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", > > >> "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) > > >> > > >> dat <- list( > > >> c(77,65,34,23,55), > > >> c(65,23,77,65,55,34), > > >> c(77,34,65), > > >> c(55,78,56), > > >> c(98,23,77,65,34)) > > >> > > >> > > >> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, > > >> function(j)all(combs[i,] %in% j > > >> > > >> On a dataset of comparable time to yours, it takes me under a minute > > and a > > >> half. > > >> > > >> > combs <- combs[rep(1:nrow(combs), length=100), ] > > >> > dat <- dat[rep(1:length(dat), length=1)] > > >> > > > >> > dim(combs) > > >> [1] 100 3 > > >> > length(dat) > > >> [1] 1 > > >> > > > >> > system.time(test <- sapply(seq_len(nrow(combs)), > > >> > function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j) > > >>user system elapsed > > >> 86.380 0.006 86.391 > > >> > > >> > > >> > > >> > > >> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan > > wrote: > > >> > Hi, > > >> > > > >> > Apologizes for the less information. > > >> > > > >> > Basically, myCombos is a matrix with 3 variables which is a triplet > > that > > >> > is > > >> > a combination of 79 codes. There are around 3lakh
Re: [R] Reducing execution time
If I understood the request this is the same programming task as counting words in a document and counting character sequences in a string or matching bytes in byte arrays (though you don't want to go down that far) You can do something like what follows. There are also vectorized greps in stringr. combs = structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L, 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) dat = list( c(77,65,34,23,55, 65,23,77, 44), c(65,23,77,65,55,34, 77, 34,65, 10), c(77,34,65), c(55,78,56), c(98,23,77,65,34, 65, 23, 77, 34)) words = unlist(apply(combs, 1 , function(d) paste(as.character(d), collapse=" "))) dat = lapply(dat, function(d) paste( as.character(d), collapse= " ")) doc = paste(dat, collapse = " ## ") # just some arbitrary separator character that isn't in your words counts = sapply(words, function(w) length(grep(w, doc))) names(counts) = words counts cbind(combs, data.frame(N = counts)) On Wed, Jul 27, 2016 at 8:32 AM, Sarah Gosleewrote: > Hi, > > It's really a good idea to use dput() or some other reproducible way > to provide data. I had to guess as to what your data looked like. > > It appears that order doesn't matter? > > Given than, here's one approach: > > combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L, > 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", > "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) > > dat <- list( > c(77,65,34,23,55), > c(65,23,77,65,55,34), > c(77,34,65), > c(55,78,56), > c(98,23,77,65,34)) > > > sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, > function(j)all(combs[i,] %in% j > > On a dataset of comparable time to yours, it takes me under a minute and a > half. > > > combs <- combs[rep(1:nrow(combs), length=100), ] > > dat <- dat[rep(1:length(dat), length=1)] > > > > dim(combs) > [1] 100 3 > > length(dat) > [1] 1 > > > > system.time(test <- sapply(seq_len(nrow(combs)), > function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j) >user system elapsed > 86.380 0.006 86.391 > > > > > On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan wrote: > > Hi, > > > > Apologizes for the less information. > > > > Basically, myCombos is a matrix with 3 variables which is a triplet that > is > > a combination of 79 codes. There are around 3lakh combination as such and > > it looks like below. > > > > V1 V2 V3 > > 65 23 77 > > 77 34 65 > > 55 34 23 > > 23 77 34 > > 34 65 55 > > > > Each triplet will compare in a list (mylist) having 8177 elements which > > will looks like below. > > > > 77,65,34,23,55 > > 65,23,77,65,55,34 > > 77,34,65 > > 55,78,56 > > 98,23,77,65,34 > > > > Now I want to count the no of occurrence of the triplet in the above > list. > > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output > looks > > like below > > > > V1 V2 V3 Freq > > 65 23 77 3 > > 77 34 65 4 > > 55 34 23 2 > > > > I hope, I made it clear this time. > > > > > > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter > wrote: > > > >> Not entirely sure I understand, but match() is already vectorized, so > you > >> should be able to lose the supply(). This would speed things up a lot. > >> Please re-read ?match *carefully* . > >> > >> Bert > >> > >> On Jul 27, 2016 6:15 AM, "sri vathsan" wrote: > >> > >> Hi, > >> > >> I created list of 3 combination numbers (mycombos, around 3 lakh > >> combinations) and counting the occurrence of those combination in > another > >> list. This comparision list (mylist) is having around 8000 records.I am > >> using the following code. > >> > >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { > >> sum(sapply(myList, function(j) { > >> sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) > >> > >> The above code takes very long time to execute and is there any other > >> effecting method which will reduce the time. > >> -- > >> > >> Regards, > >> Srivathsan.K > >> > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C/C++/Fortran Rolling Window Regressions
I agree that when appropriate Kalman Filter/Smoothing the higher-quality way to go about estimating a time-varying coefficient (given that is what they do), and I have noted that both the R package "dlm" and the function "StructTS" handle these problems quickly. I am working on that in parallel. One of the things I am unsure about with Kalman Filters is how to estimate variance parameters when the process is unusual in some way that isn't in the model and it is not feasible to adjust the model by-hand. dlm's dlmMLE seems to produce non-sense (not because of the author's work but because of assumptions). At least with moving window regressions after the unusual event is past your window the influence of that event is gone.That isn't really a question for this group it is more about me reading more. When I get that "how to handle all the strange things big data throws at you" worked out for Kalman Filters, I will go back to those because I certainly like what I see when everything is right. There is a plethora of related topics right? Bayesian Model Averaging, G-ARCH models for heteroscedasticity, etc. Anyway... roll::roll_lm, cheers! Thanks, Jeremiah On Thu, Jul 21, 2016 at 2:08 PM, Mark Leeds <marklee...@gmail.com> wrote: > Hi Jermiah: another possibly faster way would be to use a kalman filtering > framework. I forget the details but duncan and horne have a paper which > shows how a regression can be re-computed each time a new data point is > added .I > forget if they handle taking one off of the back also which is what you > need. > > The paper at the link below isn't the paper I'm talking about but it's > reference[1] in that paper. Note that this suggestion might not be a better > approach than the various approaches already suggested so I wouldn't go > this route unless you're very interested. > > > Mark > > https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf > > > > > > > On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck < > ggrothendi...@gmail.com> wrote: > >> I would be careful about making assumptions regarding what is faster. >> Performance tends to be nonintuitive. >> >> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example >> you provided rollapply/fastLm was three times faster than roll_lm. Of >> course this could change with data of different dimensions but it >> would be worthwhile to do actual benchmarks before making assumptions. >> >> I also noticed that roll_lm did not give the same coefficients as the >> other two. >> >> set.seed(1) >> library(zoo) >> library(RcppArmadillo) >> library(roll) >> z <- zoo(matrix(rnorm(10), ncol = 2)) >> colnames(z) <- c("y", "x") >> >> ## rolling regression of width 4 >> library(rbenchmark) >> benchmark(fastLm = rollapplyr(z, width = 4, >> function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])), >> by.column = FALSE), >>lm = rollapplyr(z, width = 4, >> function(x) coef(lm(y ~ x, data = as.data.frame(x))), >> by.column = FALSE), >>roll_lm = roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop = >> F]), 4, >> center = FALSE))[1:4] >> >> >> test replications elapsed relative >> 1 fastLm 1000.221.000 >> 2 lm 1000.723.273 >> 3 roll_lm 1000.642.909 >> >> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds >> <roundsjerem...@gmail.com> wrote: >> > Thanks all. roll::roll_lm was essentially what I wanted. I think >> maybe >> > I would prefer it to have options to return a few more things, but it is >> > the coefficients, and the remaining statistics you might want can be >> > calculated fast enough from there. >> > >> > >> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis < >> achim.zeil...@uibk.ac.at> >> > wrote: >> > >> >> Jeremiah, >> >> >> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use >> >> Rcpp and the former also provides rolling lm models. The latter has a >> >> generic interface that let's you define your own function. >> >> >> >> One thing to pay attention to, though, is the numerical reliability. >> >> Especially on large time series with relatively short windows there is >> a >> >> good chance of encountering numerically challenging situations. The QR >> >> decomposition used by lm is fairly robust while other more >> straightforward >> >> matrix
Re: [R] C/C++/Fortran Rolling Window Regressions
I appreciate the timing, so much so I changed the code to show the issue. It is a problem of scale. roll_lm probably has a heavy start-up cost but otherwise completely out-performs those other versions at scale. I suspect you are timing the nearly constant time start-up cost in small data. I did give code to paint a picture, but it was just cartoon code lifted from stackexchange. If you want to characterize the real problem it is closer to: 30 day rolling windows on 24 daily (by hour) measurements for 5 years with 24+7 -1 dummy predictor variables and finally you need to do this for 300 sets of data. Pseudo-code is closer to what follows and roll_lm can handle that input in a timely manner. You can do it with lm.fit, but you need to spend a lot of time waiting. The issue of accuracy needs a follow-up check. Not sure why it would be different. Worth a check on that. Thanks, Jeremiah library(rbenchmark) N = 30*24*12*5 window = 30*24 npred = 15 #15 chosen arbitrarily... set.seed(1) library(zoo) library(RcppArmadillo) library(roll) x = matrix(rnorm(N*(npred+1)), ncol = npred+1) colnames(x) <- c("y", paste0("x", 1:npred)) z <- zoo(x) benchmark( roll_lm = roll_lm(coredata(z[, 1, drop = F]), coredata(z[, -1, drop = F]), window, center = FALSE), replications=3) Which comes out as: test replications elapsed relative user.self sys.self user.child sys.child 1 roll_lm3 6.273138.3120.654 0 0 ## You arn't going to get that below... benchmark(fastLm = rollapplyr(z, width = window, function(x) coef(fastLm(cbind(1, x[, -1]), x[, 1])), by.column = FALSE), lm = rollapplyr(z, width = window, function(x) coef(lm(y ~ ., data = as.data.frame(x))), by.column = FALSE), replications=3) On Thu, Jul 21, 2016 at 1:28 PM, Gabor Grothendieck <ggrothendi...@gmail.com > wrote: > I would be careful about making assumptions regarding what is faster. > Performance tends to be nonintuitive. > > When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example > you provided rollapply/fastLm was three times faster than roll_lm. Of > course this could change with data of different dimensions but it > would be worthwhile to do actual benchmarks before making assumptions. > > I also noticed that roll_lm did not give the same coefficients as the > other two. > > set.seed(1) > library(zoo) > library(RcppArmadillo) > library(roll) > z <- zoo(matrix(rnorm(10), ncol = 2)) > colnames(z) <- c("y", "x") > > ## rolling regression of width 4 > library(rbenchmark) > benchmark(fastLm = rollapplyr(z, width = 4, > function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])), > by.column = FALSE), >lm = rollapplyr(z, width = 4, > function(x) coef(lm(y ~ x, data = as.data.frame(x))), > by.column = FALSE), >roll_lm = roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop = > F]), 4, > center = FALSE))[1:4] > > > test replications elapsed relative > 1 fastLm 1000.221.000 > 2 lm 1000.723.273 > 3 roll_lm 1000.642.909 > > On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds > <roundsjerem...@gmail.com> wrote: > > Thanks all. roll::roll_lm was essentially what I wanted. I think > maybe > > I would prefer it to have options to return a few more things, but it is > > the coefficients, and the remaining statistics you might want can be > > calculated fast enough from there. > > > > > > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis < > achim.zeil...@uibk.ac.at> > > wrote: > > > >> Jeremiah, > >> > >> for this purpose there are the "roll" and "RcppRoll" packages. Both use > >> Rcpp and the former also provides rolling lm models. The latter has a > >> generic interface that let's you define your own function. > >> > >> One thing to pay attention to, though, is the numerical reliability. > >> Especially on large time series with relatively short windows there is a > >> good chance of encountering numerically challenging situations. The QR > >> decomposition used by lm is fairly robust while other more > straightforward > >> matrix multiplications may not be. This should be kept in mind when > writing > >> your own Rcpp code for plugging it into RcppRoll. > >> > >> But I haven't check what the roll package does and how reliable that > is... > >> > >> hth, > >> Z > >> > >> > >> On Thu, 21 Jul 2016, jeremiah rounds wrote: > >> > >> Hi, > >>> > >>> A not unusual task is performing a multiple
Re: [R] C/C++/Fortran Rolling Window Regressions
Thanks all. roll::roll_lm was essentially what I wanted. I think maybe I would prefer it to have options to return a few more things, but it is the coefficients, and the remaining statistics you might want can be calculated fast enough from there. On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <achim.zeil...@uibk.ac.at> wrote: > Jeremiah, > > for this purpose there are the "roll" and "RcppRoll" packages. Both use > Rcpp and the former also provides rolling lm models. The latter has a > generic interface that let's you define your own function. > > One thing to pay attention to, though, is the numerical reliability. > Especially on large time series with relatively short windows there is a > good chance of encountering numerically challenging situations. The QR > decomposition used by lm is fairly robust while other more straightforward > matrix multiplications may not be. This should be kept in mind when writing > your own Rcpp code for plugging it into RcppRoll. > > But I haven't check what the roll package does and how reliable that is... > > hth, > Z > > > On Thu, 21 Jul 2016, jeremiah rounds wrote: > > Hi, >> >> A not unusual task is performing a multiple regression in a rolling window >> on a time-series.A standard piece of advice for doing in R is >> something >> like the code that follows at the end of the email. I am currently using >> an "embed" variant of that code and that piece of advice is out there too. >> >> But, it occurs to me that for such an easily specified matrix operation >> standard R code is really slow. rollapply constantly returns to R >> interpreter at each window step for a new lm. All lm is at its heart is >> (X^t X)^(-1) * Xy, and if you think about doing that with Rcpp in rolling >> window you are just incrementing a counter and peeling off rows (or >> columns >> of X and y) of a particular window size, and following that up with some >> matrix multiplication in a loop. The psuedo-code for that Rcpp >> practically writes itself and you might want a wrapper of something like: >> rolling_lm (y=y, x=x, width=4). >> >> My question is this: has any of the thousands of R packages out there >> published anything like that. Rolling window multiple regressions that >> stay in C/C++ until the rolling window completes? No sense and writing it >> if it exist. >> >> >> Thanks, >> Jeremiah >> >> Standard (slow) advice for "rolling window regression" follows: >> >> >> set.seed(1) >> z <- zoo(matrix(rnorm(10), ncol = 2)) >> colnames(z) <- c("y", "x") >> >> ## rolling regression of width 4 >> rollapply(z, width = 4, >> function(x) coef(lm(y ~ x, data = as.data.frame(x))), >> by.column = FALSE, align = "right") >> >> ## result is identical to >> coef(lm(y ~ x, data = z[1:4,])) >> coef(lm(y ~ x, data = z[2:5,])) >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] C/C++/Fortran Rolling Window Regressions
Hi, A not unusual task is performing a multiple regression in a rolling window on a time-series.A standard piece of advice for doing in R is something like the code that follows at the end of the email. I am currently using an "embed" variant of that code and that piece of advice is out there too. But, it occurs to me that for such an easily specified matrix operation standard R code is really slow. rollapply constantly returns to R interpreter at each window step for a new lm. All lm is at its heart is (X^t X)^(-1) * Xy, and if you think about doing that with Rcpp in rolling window you are just incrementing a counter and peeling off rows (or columns of X and y) of a particular window size, and following that up with some matrix multiplication in a loop. The psuedo-code for that Rcpp practically writes itself and you might want a wrapper of something like: rolling_lm (y=y, x=x, width=4). My question is this: has any of the thousands of R packages out there published anything like that. Rolling window multiple regressions that stay in C/C++ until the rolling window completes? No sense and writing it if it exist. Thanks, Jeremiah Standard (slow) advice for "rolling window regression" follows: set.seed(1) z <- zoo(matrix(rnorm(10), ncol = 2)) colnames(z) <- c("y", "x") ## rolling regression of width 4 rollapply(z, width = 4, function(x) coef(lm(y ~ x, data = as.data.frame(x))), by.column = FALSE, align = "right") ## result is identical to coef(lm(y ~ x, data = z[1:4,])) coef(lm(y ~ x, data = z[2:5,])) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if + is.na
Your error message is because if wants a single value and you are giving it a vector. Typically you want to use functions all or any to correct this error message (look them up ?all ?any) and eg if(any(is.na(...))) But in this case to accomplish the task you're after I don't even think you want to use an if. I am not going to give you precise code because I wasn't able to decipher exactly what you were trying to do but something like: b[is.na(a)] = 43 might be helpful. This line would put a 43 in b in the corresponding entry that was na in a. Good luck!. Date: Sun, 14 Jun 2009 12:48:58 -0700 From: gregori...@gmail.com To: r-help@r-project.org Subject: [R] if + is.na Hello! I wont to use a function is.na() I have two vectors: a=c(1,NA,3,3,3) b=c(0,0,0,0,0) and when I use is.na function it's ok: is.na(a) [1] FALSE TRUE FALSE FALSE FALSE but I would create sth like this: for i in 1:length(a){ if (wsp[i] == is.na(a)) {b=43} } or like this if(is.na(a)) b=3 else a [1] 1 NA 3 3 3 but I always get an error: the condition has length 1 and only the first element will be used Could you help me how I may avoid this problem and use function is.na inside function if - else Please -- View this message in context: http://www.nabble.com/if-%2B-is.na-tp24025136p24025136.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Lauren found her dream laptop. Find the PC thats right for you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help to speed up loops in r
Other then the reengineering of the approach, one thing that helps is don't index rows of data frames via loops... ever. It is actually faster to convert to a matrix, do the operations, and then convert back to a data frame if you have too. As an example I have your code in a function: foo = function(averagedreplicates, zz){ iindex = 1:(dim(averagedreplicates)[2]) for (i in iindex) { cat(i,'\n') #calculates Meanss #Sample A averagedreplicates[i,2] - (zz[i,2] + zz[i,3])/2 averagedreplicates[i,3] - (zz[i,4] + zz[i,5])/2 averagedreplicates[i,4] - (zz[i,6] + zz[i,7])/2 averagedreplicates[i,5] - (zz[i,8] + zz[i,9])/2 averagedreplicates[i,6] - (zz[i,10] + zz[i,11])/2 #Sample B averagedreplicates[i,7] - (zz[i,12] + zz[i,13])/2 averagedreplicates[i,8] - (zz[i,14] + zz[i,15])/2 averagedreplicates[i,9] - (zz[i,16] + zz[i,17])/2 averagedreplicates[i,10] - (zz[i,18] + zz[i,19])/2 averagedreplicates[i,11] - (zz[i,20] + zz[i,21])/2 #Sample C averagedreplicates[i,12] - (zz[i,22] + zz[i,23])/2 averagedreplicates[i,13] - (zz[i,24] + zz[i,25])/2 averagedreplicates[i,14] - (zz[i,26] + zz[i,27])/2 averagedreplicates[i,15] - (zz[i,28] + zz[i,29])/2 averagedreplicates[i,16] - (zz[i,30] + zz[i,31])/2 #Sample D averagedreplicates[i,17] - (zz[i,32] + zz[i,33])/2 averagedreplicates[i,18] - (zz[i,34] + zz[i,35])/2 averagedreplicates[i,19] - (zz[i,36] + zz[i,37])/2 averagedreplicates[i,20] - (zz[i,38] + zz[i,39])/2 averagedreplicates[i,21] - (zz[i,40] + zz[i,41])/2 } return(averagedreplicates) } I then make matrix and data.frame versions of things similar in size to what you are working with: zz.as.m = matrix(runif(95000*41),95000,41) zz.as.df = as.data.frame(zz.as.m) ar.as.m = matrix(0,95000,21) ar.as.df = as.data.frame(ar.as.m) And we can time the matrix versions: start = Sys.time() x = foo(ar.as.m,zz.as.m) stop = Sys.time() stop-start # .06 seconds for me And on the data frame versions? #using the data frame versions start = Sys.time() x = foo(ar.as.df,zz.as.df) stop = Sys.time() stop-start # 31 seconds for me It takes for me 516 times as long to do the same work in data frames as it would have took in matrixes for me. People say never use loops in R, and I wish they wouldn't say it like that because it distracts from the facts of the matter which is that sometimes looping in R is quite reasonably fast. And sometimes... like when you are indexing rows of a data frame it is horrible. These are the little things I learned combing through my Masters project for speed. The only caveat of following this advice of always do this sort of work in matrixes is that it can be a little time consuming(developer time) repairing factors. But in terms of code run time it is absolute essential to use the right data structure for the job. Hope this is of assistance, Jeremiah Rounds Date: Mon, 8 Jun 2009 15:45:40 + From: amitrh...@yahoo.co.uk To: r-help@r-project.org Subject: [R] help to speed up loops in r Hi i am using a script which involves the following loop. It attempts to reduce a data frame(zz) of 95000 * 41 down to a data frame (averagedreplicates) of 95000 * 21 by averaging the replicate values as you can see in the script below. This script however is very slow (2days). Any suggestions to speed it up. NB I have also tried using rowMeans rather than adding the 2 values and dividing by 2. (same problem) #SCRIPT STARTS for (i in 1:length(averagedreplicates[,1])) #for (i in 1:dim(averagedreplicates)[1]) { cat(i,'\n') #calculates Meanss #Sample A averagedreplicates[i,2] - (zz[i,2] + zz[i,3])/2 averagedreplicates[i,3] - (zz[i,4] + zz[i,5])/2 averagedreplicates[i,4] - (zz[i,6] + zz[i,7])/2 averagedreplicates[i,5] - (zz[i,8] + zz[i,9])/2 averagedreplicates[i,6] - (zz[i,10] + zz[i,11])/2 #Sample B averagedreplicates[i,7] - (zz[i,12] + zz[i,13])/2 averagedreplicates[i,8] - (zz[i,14] + zz[i,15])/2 averagedreplicates[i,9] - (zz[i,16] + zz[i,17])/2 averagedreplicates[i,10] - (zz[i,18] + zz[i,19])/2 averagedreplicates[i,11] - (zz[i,20] + zz[i,21])/2 #Sample C averagedreplicates[i,12] - (zz[i,22] + zz[i,23])/2 averagedreplicates[i,13] - (zz[i,24] + zz[i,25])/2 averagedreplicates[i,14] - (zz[i,26] + zz[i,27])/2 averagedreplicates[i,15] - (zz[i,28] + zz[i,29])/2 averagedreplicates[i,16] - (zz[i,30] + zz[i,31])/2 #Sample D averagedreplicates[i,17] - (zz[i,32] + zz[i,33])/2 averagedreplicates[i,18] - (zz[i,34] + zz[i,35])/2 averagedreplicates[i,19] - (zz[i,36] + zz[i,37])/2 averagedreplicates[i,20] - (zz[i,38] + zz[i,39])/2 averagedreplicates[i,21] - (zz[i,40] + zz[i,41])/2 } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R
Re: [R] how to randomly eliminate half the entries in a vector?
Here is what I got for script through your third question: set.seed(1) x1 = rbinom(200,1,.5) x2 = rbinom(200,1,.5) differ = x1 != x2 differ.indexes = (1:length(x1))[differ == TRUE] #you were unclear if you want to round up or round down on odd index of differ.indexes n = floor( length(differ.indexes)/2) #sampling without replacement random.indexes = sample(differ.indexes,n ) swapping = x1[random.indexes] #with 1s and 0s you can do this without this variable. x1[random.indexes] = x2[random.indexes] x2[random.indexes] = swapping Good luck, Jeremiah Date: Tue, 17 Feb 2009 20:17:51 -0500 From: esmail...@gmail.com To: r-help@r-project.org Subject: [R] how to randomly eliminate half the entries in a vector? Hello all, I need some help with a nice R-idiomatic and efficient solution to a small problem. Essentially, I am trying to eliminate randomly half of the entries in a vector that contains index values into some other vectors. More details: I am working with two strings/vectors of 0s and 1s. These will contain about 200 elements (always the same number for both) I want to: 1. determines the locations of where the two strings differ -- easy using xor(s1, s2) 2. *randomly* selects *half* of those positions -- not sure how to do this. I suppose the result would be a list of index positions of size sum(xor(s1, s2))/2 3. exchange (flip) the bits in those random positions for both strings -- I have something that seems to do that, but it doesn't look slick and I wonder how efficient it is. Mostly I need help for #2, but will happily accept suggestions for #3, or for that matter anything that looks odd. Below my partial solution .. the HUX function is what I am trying to finish if someone can point me in the right direction. Thanks Esmail -- rm(list=ls()) # create a binary vector of size len # create_bin_Chromosome - function(len) { sample(0:1, len, replace=T) } # HUX - half uniform crossover # # 1. determines the locations of where the two strings # differ (easy xor) # # 2. randomly selects half of those positions # # 3. exchanges (flips) the bits in those positions for # both # HUX - function(b1, b2) { # 1. find differing bits r=xor(b1, b2) # positions where bits differ different = which(r==TRUE) cat(\nhrp: , different, \n) # 2. ??? how to do this best so that each time # a different half subset is selected? I.e., # sum(r)/2 positions. # 3. this flips *all* positions, should really only flip # half of them (randomly selected half) new_b1 = b1 new_b2 = b2 for(i in different) # should contain half the entries (randomly) { new_b1[i] = b2[i] new_b2[i] = b1[i] } result - matrix(c(new_b1, new_b2), 2, LEN, byrow=T) result } LEN = 5 b1=create_bin_Chromosome(LEN) b2=create_bin_Chromosome(LEN) cat(b1, \n) cat(b2, \n) idx=HUX(b1, b2) cat(\n\n) cat(idx[1,], \n) cat(idx[2,], \n) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ go. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to paste graph from R in Latex?
For school work I use png. Png files are more efficient size/quality wise than png, and also lend themselves to more generic application/viewing than ps. In R this typically takes the form of: setwd(...) #set working directory before starting any work typically at the top of scripts ... # stuff png(filename,height=800, width=800) #graphical commands dev.off() One of the great things about the png command is the size formatting. One great trick is to increase the size of the plotting area, plot, and then in latex shrink the graphic down. There is alot of graphics where this makes everything look better with very little work due to everything drawing at a finer resolution (in some lossy sense). In your latex you will want to use package epsfig because under windows the png bounding box info isn't what default latex packages expect and epsfig can fix that easily. Typically this has the form \usepackage{epsfig} \begin{document} \begin{figure}[!htbp] \center \caption{Jittered pairs plot of severity predictors colored by red is severity 1.} \label{bcpairs} \epsfig{file=bcpairs.png, bb= 0 0 800 800,width=5.25in, clip=} \end{figure} \end{document} The key line is \epsfig. bb = is the bounding box which corresponds to whatever you had in the png command in R. width is where you resize it. You supply the width and the package will 1 to 1 rescale it. There are two tricks I picked up in my travels using this for homework. Well there are three, but I don't have example of the 3rd handy (side by side subfigures). One is clipping a figure to get rid of a piece of it. That is a simple as changing the bb command to only bound the parts you want. The other is shifting the graphic into the left margin a little bit. Handy for using the entire page on some graphics that just arnt easy to make any smaller. That is done like so: \begin{figure}[tbp] \caption{Wine data pairs plots colored by cultivar.} \label{winepairs} \begin{minipage}{9in} \hspace{-.75in} \epsfig{file=ex2pairs.png, bb= 0 0 1200 1200,width=7in, clip=} \end{minipage} \end{figure} The key there is you start a minipage and then shift it to the left. Note here the command in R was: png(ex2pairs.png, height=1200, width=1200) for a large scatterplot. A large scatterplot is an example of something that often looks better painted at a higher resolution, saved, and then shrunk down. - Someone mentioned Sweave. Sweaves value really depends on who you are and what your doing. Its work cycle is not appropriate for students or anyone that needs rapid cycle prototyping imo. Its great flaw is that it does not work well with changing a little something--looking at the results in R followed by changing a little something in latex--looking at the results in dvi repeated over and over and over again. The reason is it has to repeat far to much work in each cycle. Often times repeating long calculations. This system you open a script in tinn-r. You run it. You have your texmaker open. You compilete your document. You dont like the graphic. You make your change to the plotting in your script. You highlight it and send it to r. You open it in a graphics viewer via double click or you simply compile your latex document again. Check it. Sweave is not at all friendly to that check your work as you go mentality. It really needs a graphical interface that lets you indicate what not to redo, and just redo things incrementally. Date: Fri, 16 May 2008 18:24:00 -0700 From: [EMAIL PROTECTED] To: R-help@r-project.org Subject: [R] How to paste graph from R in Latex? Dear R-expert, Is it possible to save graph from R into Latex document? I can see save as metafile , PNG, pdf etc, but I'm not sure which one to use. Thank you so much for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Give to a good cause with every e-mail. Join the i’m Initiative from Microsoft. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] heatmap on pre-established hclust output?
To: [EMAIL PROTECTED] From: [EMAIL PROTECTED] Date: Fri, 16 May 2008 17:55:26 +0200 Subject: [R] heatmap on pre-established hclust output? Hi, Can someone please guide me towards how to produce heatmap output from the output of hclust run prior to the actual heatmap call? I have some rather lengthy clustering going on and tweeking the visual output with heatmap recalculating the clustering every time is not feasible. Thanks, Joh I can't say that i have actually tackled this, but I have some experience with the functions you mentioned. Heatmap takes an hclustfun function parameter. You can create a custom clustering function. I don't believe there is a rule that says you have to do actual work in that function call. Look at just returning the results of your more complicated clustering in that call without actually doing the calculations. Jeremiah Rounds Graduate Student Utah State University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ E-mail for the greater good. Join the im Initiative from Microsoft. ood [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strip white in character strings
Date: Wed, 14 May 2008 12:06:39 -0400 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [R] strip white in character strings Dear all, I have several datasets and I want to generate pdf plots from them. I also want to generate automatically the names of the files. They are country-specific and the element mycurrentdata[1,1] contains this information. So what I do is something like this: pdf(file=paste(mycurrentdata[1,1], .pdf, sep=), width=...etc) The only problem I have is that some of the country names contain white space (e.g., United Kingdom). This is no problem for generating the pdf plots but it may become problematic during further processing (e.g. incl. the plots in LaTeX documents). Is there an easy function to strip white space out of character strings (similar to the strip.white=TRUE option in read.table/scan)? How about a = United Kingdom paste(unlist(strsplit(a,split= )), collapse=) [1] UnitedKingdom Note better might is using generic trimming functions after the split to catch any left over non-space white space stuff in each split. I'd appreciate any kind of help and I hope I did not miss anything completely obvious. Thanks, Roland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ 1. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie question about vector matrix multiplication
Date: Wed, 14 May 2008 15:18:32 -0400 From: [EMAIL PROTECTED] To: r-help@r-project.org Subject: [R] Newbie question about vector matrix multiplication Hello All, I have a covariance matrix, generated by read.table, and cov: co-cov(read.table(c:/r.x)) X Y Z X 0.0012517684 0.0002765438 0.0007887114 Y 0.0002765438 0.0002570286 0.0002117336 Z 0.0007887114 0.0002117336 0.0009168750 And a weight vector generated by w- read.table(c:/r.weights) X Y Z 1 0.5818416 0.2158531 0.2023053 I want to compute the product of the matrix and vectors termwise to generate a 3x3 matrix, where m[i,j]=w[i]*co[i,j]*w[j]. 0.000423773 7.47216E-08 4.41255E-08 7.47216E-08 1.96566E-11 4.29229E-11 4.41255E-08 4.29229E-11 4.11045E-11 First off your example matrix does not seem to represent the equation you wrote down. For example m[1,3] should be m[1,3] = 0.5818416 * 0.0007887114* 0.2023053 = .000928. I apologize if that represents something incorrect on part. However if I am correct then I believe what you seek is the line below: m = w %*% t(w)*co To get there btw picture moving the weights together then picture multiplying two equal sized matrixes together by each coefficient. Is this possible without writing explicit loops? Thank you, Dan Stanger Eaton Vance Management 200 State Street Boston, MA 02109 617 598 8261 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ esh_messenger_052008 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] array dimension changes with assignment
Why does the assignment of a 3178x93 object to another 3178x93 object remove the dimension attribute? GT - array(dim = c(6,nrow(InData),ncol(InSNPs))) dim(GT)[1] 6 3178 93 SNP1 - InSNPs[InData[,C1],] dim(SNP1)[1] 3178 93 SNP2 - InSNPs[InData[,C2],] dim(SNP2)[1] 3178 93 dim(pmin(SNP1,SNP2))[1] 3178 93 GT[1,,] - pmin(SNP1,SNP2) dim(GT)NULL # why?? GT[2,,] - pmax(SNP1,SNP2)Error in GT[2, , ] - pmax(SNP1, SNP2) : incorrect number of .subscripts --- My understanding is that an array is just a list with a dimension attribute, so first note that loosing the dim attribute is not a great loss. It does not represent an inefficiency. But consider this code: GT - array(dim = c(6,3178, 93) ) dim(GT)[1]6 3178 93 SNP1 -as.array(matrix(0,nrow=3178,ncol=93)) dim(SNP1)[1] 3178 93 GT[1,,] - SNP1 dim(GT)[1]6 3178 93 Here what you wanted to happen happened just fine. So the question you might ask yourself is: what is different? And that leads to asking what class is the SNP1 object? If you can coerce into an array you probably can avoid the issue. Jeremiah Rounds Graduate Student Utah State University _ Get Free (PRODUCT) RED Emoticons, Winks and Display Pics. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.