[R] searchTwitter with unicode- UTF8
Dear group I uses 'searchTwitter' to get users tweets it works for English words such as (iphone) but when working with Arabic text example searchTwitter(�) it gave me the following Error Error in twInterfaceObj$doAPICall(cmd, params, GET, ...) : Error: Forbidden my questions are how to make it understand the unicode characters? and if It is a hashtag should I write it that way #� many thanks in advance Ragia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recursively rename dir and files
hi jim thank you very much for your help: what a nice piece of code! inspired by your neat solution let me here post a slight variation upon your lines which is now also dealing also with caps in file and dir names (a potentially useful function for the proper housekeeping of the wd) it works but there is one potentially dangerous drawback (to be somehow controlled): it renames the R code file eventually present in the wd which is usually named something like *.R which is renamed as *.r I definitely need to think a viable solution to avoid that problem... setwd(.) dir.create(./My Dir with Spaces and Caps) dir.create(./My Dir with Spaces and Caps/My Sub Dir With Spaces and Caps) file.create(./My Dir with Spaces and Caps/My Dir file With Spaces and Caps.txt) file.create(./My Dir with Spaces and Caps/My Sub Dir With Spaces and Caps/My Sub Dir File With Spaces and Caps.txt) file.create(./My Dir with Spaces and Caps/My Sub Dir With Spaces and Caps/MySubDirFileJustWithCaps.txt) #new function recursive_replace_lowercase-function(path=., replace= , with=_, lowercase=TRUE) { # this is the base case filelist-list.files(path, full.names=TRUE) if (lowercase) { for(filename in filelist) file.rename(filename,gsub(replace,with,tolower(filename))) } else { for(filename in filelist) file.rename(filename,gsub(replace,with,filename)) } # and this is the recursive part dirlist-list.dirs(path, full.names=TRUE, recursive=FALSE) if(length(dirlist)) { for(dirname in dirlist) recursive_replace_lowercase(dirname, replace=replace, with=with, lowercase=lowercase) } } recursive_replace_lowercase() Hi maxbre, Try this: recursive_replace-function(path=.,replace= ,with=_) { filelist-list.files(path,full.names=TRUE) for(filename in filelist) { if(length(grep(replace,filename))) file.rename(filename,gsub(replace,with,filename)) } dirlist-list.dirs(path,full.names=TRUE,recursive=FALSE) if(length(dirlist)) { for(dirname in dirlist) recursive_replace(dirname,replace=replace,with=with) } } Jim On Fri, Apr 10, 2015 at 5:58 AM, maxbre mbres...@arpa.veneto.it wrote: this is my reproducible example setwd(.) dir.create(./my dir with space) dir.create(./my dir with space/my sub dir with space) file.create(./my dir with space/my dir file with space.txt) file.create(./my dir with space/my sub dir with space/my sub dir file with space.txt) now I need to rename recursively all dirs and files in order to get rid of spaces the following attempt is not getting to the point... mylist-dir(., full.names=TRUE, recursive=TRUE, include.dirs=TRUE) for (i in mylist){ file.rename(i, gsub( , _, i)) } #or more simply... file.rename(mydirs, gsub( , _, mydirs)) ...because (clearly) I got some warning messages like can not rename file because it is not existing; and I definitely understand that because in the process of renaming of the the upper level directory the full path of the nested directories and files is changed and any longer visible to the procedure... the problem now is that I'm not enough clear how to proceed with an alternative strategy in order to properly sort out the problem... for reasons I'm not mentioning here I must stick with a solution in R language any hint much appreciated thanks -- View this message in context: http://r.789695.n4.nabble.com/recursively-rename-dir-and-files-tp4705667.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on CCA and RDA analysis
Dear R experts, I wanted to know if you can suggest me any website or tutorial just to learn about how to make a RDA or CDA in R Thanks in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple input files single output - lapply? - pls advise
Dear R users, hope you can point me in the right direction. I am stuck with the following problem. My function f1 reads csv files, manipulates them into the right xts format and returns the output at the end. This works perfectly fine. Now, I need to run f1 over a long list of various csv files, all of the same format, but different dates (or time stamps) which is guaranteed by design. All these individual results per each file I hope to combine into one xts or zoo object. I tried lapply as follows: # all csv files start with z1: file.names - list.files(pattern = z1*, full.names = T, recursive = FALSE) # my function: f1 - function(x, param){ # x: the csv file # param: some parameter for calculation # spits results out return(results) } res - lapply(file.names, function(x){f1(x, param)}) I wrote the output to res and by subsetting res[1], res[2], ... I can retrieve the results of each individual csv file on which I applied my function f1. How could I append or merge all individual results res[i] for my i csv files into one xts or zoo object? Many thanks in advance, Bernard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding values in a dataframe at a specified hour
Hello, I have a large dataframe (windHW) of wind speeds (ws) at each hour from many days over a set of years. Some of these values are obviously wrong (600 m/s) and I want to get rid of all the values that are larger than 5*sigma for each hour. The 5*sigma (variable name sigma5) values are located in different dataframes for each season, with each dataframe titled as a season. For example, in the dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. So my question is as follows: how can I get it so that the code will be able to find all the wind speed values in the dataframe, windHW, of a specific hour be higher than the 5*sigma value at that hour? For example, I would like to find if any of the wind speed values at hour 1 are higher than 79.6 m/s, and if so, then replace that value with NA. I have something like this but I can't seem to figure out how to get it for specific hours: windHW$ws[windHW$ws=spring$sigma5] - NA I imported the data using readLines and into the dataframe windHW. I also have R version 3.1.1 Any help would be appreciated! Thanks, Alexandra __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to complete this code
Hi, Consider the line below: for(r in a)for (s in a) x=rbind(x,apply(replicate(1000,V(r,s)),1,mean)) V is a vector of (n-1) variables calculated by some rule and is a functions of (r,s). So the line above produces 1000 replicates of V for each (r,s), puts them in a matrix, calculates the mean of them, and finally puts the means for all (r,s) in a matrix. So the produced matrix, x, is the mean of (n-1) V 's for each possible value of (r,s) in each row. Now for simplicity fix (r,s) in just one point and let n=5. So in each replicate we have only one V which is a vector consisted of 4 variables. Name the elements of V as U1, U2, U3 and U4. Then we can let V(i) = [U1i , U2i , U3i , U4i] which shows each row of V produced per replicate (i=1,2,...,1000). Therefore we can say x=[x1 , x2 , x3 , x4] which is the vector of means calculated at the end. Now what I need is to first calculate the vector below per replicate (i=1,...,1000): Er(i) = [ |Ui1-x1| , |Ui2-x2| , |Ui3-x3| , |Ui4-x4| ] where |A| shows the absolute value of A. Then I should calculate mean of Er(i) 's and put the result in a vector. I just don't know how I can calculate Er(i) 's in the given line above. On the other words, I don't know where I should add the required code in the given line. Thanks for any help in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding values in a dataframe at a specified hour
Update: I have this so far. * The first column of windHW is the wind speed. The 5th column of the dataframe, spring, is the 5*sigma value of every hour. hourRow gives out all the rows of wind speed at a given hour. for (i in 0:23){ hourRow = which(windHW$hour==i,arr.ind=TRUE) for (h in hourRow){ if (windHW[h,1]=spring[spring$hour==i,5]){ windHW[h,1]-NA} } } This then gives the error: Error in if (windHW[h, 1] = spring[spring$hour == i, 5]) { : argument is of length zero *Note: The dataframe for each of the seasons have 24 rows corresponding to each hour of the day 0:23. Thanks, Alexandra On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena amc5...@gmail.com wrote: Hello, I have a large dataframe (windHW) of wind speeds (ws) at each hour from many days over a set of years. Some of these values are obviously wrong (600 m/s) and I want to get rid of all the values that are larger than 5*sigma for each hour. The 5*sigma (variable name sigma5) values are located in different dataframes for each season, with each dataframe titled as a season. For example, in the dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. So my question is as follows: how can I get it so that the code will be able to find all the wind speed values in the dataframe, windHW, of a specific hour be higher than the 5*sigma value at that hour? For example, I would like to find if any of the wind speed values at hour 1 are higher than 79.6 m/s, and if so, then replace that value with NA. I have something like this but I can't seem to figure out how to get it for specific hours: windHW$ws[windHW$ws=spring$sigma5] - NA I imported the data using readLines and into the dataframe windHW. I also have R version 3.1.1 Any help would be appreciated! Thanks, Alexandra __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing words and initials with tm
Hi Sun, No, I was thinking of something like hunspell, which seems to fit into the sort of work that you are doing. Jim On Fri, Apr 10, 2015 at 11:42 PM, Sun Shine phaedr...@gmail.com wrote: Thanks Jeff. I'll add that to the ever-growing list my current studies are generating daily. :-) Cheers S On 10/04/15 14:32, Jeff Newmiller wrote: I suspect that it might have something to do with regular expressions, but to be honest, I'm (currently) pretty crap with those. I cannot think of a better incentive to take action on this hole in your education and buckle down to learn regular expressions. There are many books and tutorials available. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On April 10, 2015 3:19:51 AM PDT, Sun Shine phaedr...@gmail.com wrote: Hi list Using the tm package, part of the pre-processing work is to remove words, etc. from the corpus. I wish to remove people's names and also their initials which are peppered throughout the corpus. But, because some people's initials are the same as parts of common words - e.g. 'am' = 'became' = 'bec e' or 'ec' = 'because' = 'b ause' or 'ar' = 'arrival' = 'rival' (which has a completely different meaning). Is there any way of doing this without leaving a trail of nonsense half-terms behind? I suspect that it might have something to do with regular expressions, but to be honest, I'm (currently) pretty crap with those. Would it make a difference if I removed initials and names *prior* to converting all text to lower case, so I remove 'AM' and because 'became' is lower case, it should remain unaffected? Any recommendations on how best to proceed with this? Thanks as always. Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding values in a dataframe at a specified hour
Hi Alexandra, The error probably comes from the first iteration of i in 0:23. As indexing in R begins at 1, there is no element 0. Try using: for(i in 1:24) { ... and see what happens. Jim On Sat, Apr 11, 2015 at 7:06 AM, Alexandra Catena amc5...@gmail.com wrote: Update: I have this so far. * The first column of windHW is the wind speed. The 5th column of the dataframe, spring, is the 5*sigma value of every hour. hourRow gives out all the rows of wind speed at a given hour. for (i in 0:23){ hourRow = which(windHW$hour==i,arr.ind=TRUE) for (h in hourRow){ if (windHW[h,1]=spring[spring$hour==i,5]){ windHW[h,1]-NA} } } This then gives the error: Error in if (windHW[h, 1] = spring[spring$hour == i, 5]) { : argument is of length zero *Note: The dataframe for each of the seasons have 24 rows corresponding to each hour of the day 0:23. Thanks, Alexandra On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena amc5...@gmail.com wrote: Hello, I have a large dataframe (windHW) of wind speeds (ws) at each hour from many days over a set of years. Some of these values are obviously wrong (600 m/s) and I want to get rid of all the values that are larger than 5*sigma for each hour. The 5*sigma (variable name sigma5) values are located in different dataframes for each season, with each dataframe titled as a season. For example, in the dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. So my question is as follows: how can I get it so that the code will be able to find all the wind speed values in the dataframe, windHW, of a specific hour be higher than the 5*sigma value at that hour? For example, I would like to find if any of the wind speed values at hour 1 are higher than 79.6 m/s, and if so, then replace that value with NA. I have something like this but I can't seem to figure out how to get it for specific hours: windHW$ws[windHW$ws=spring$sigma5] - NA I imported the data using readLines and into the dataframe windHW. I also have R version 3.1.1 Any help would be appreciated! Thanks, Alexandra __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding values in a dataframe at a specified hour
Hi Jim, Thanks for the response, but unfortunately it results in the same error. I think it is something wrong with the if statement. I tried it out manually for the first row and hour that it's testing and indeed, the wind speed is not higher than the 5*sigma value. Since it is not higher than the 5*sigma value, I would think it would just pass to the next loop, yet it doesn't. I will keep trying! Thanks, Alexandra On Fri, Apr 10, 2015 at 3:43 PM, Jim Lemon drjimle...@gmail.com wrote: Hi Alexandra, The error probably comes from the first iteration of i in 0:23. As indexing in R begins at 1, there is no element 0. Try using: for(i in 1:24) { ... and see what happens. Jim On Sat, Apr 11, 2015 at 7:06 AM, Alexandra Catena amc5...@gmail.com wrote: Update: I have this so far. * The first column of windHW is the wind speed. The 5th column of the dataframe, spring, is the 5*sigma value of every hour. hourRow gives out all the rows of wind speed at a given hour. for (i in 0:23){ hourRow = which(windHW$hour==i,arr.ind=TRUE) for (h in hourRow){ if (windHW[h,1]=spring[spring$hour==i,5]){ windHW[h,1]-NA} } } This then gives the error: Error in if (windHW[h, 1] = spring[spring$hour == i, 5]) { : argument is of length zero *Note: The dataframe for each of the seasons have 24 rows corresponding to each hour of the day 0:23. Thanks, Alexandra On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena amc5...@gmail.com wrote: Hello, I have a large dataframe (windHW) of wind speeds (ws) at each hour from many days over a set of years. Some of these values are obviously wrong (600 m/s) and I want to get rid of all the values that are larger than 5*sigma for each hour. The 5*sigma (variable name sigma5) values are located in different dataframes for each season, with each dataframe titled as a season. For example, in the dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. So my question is as follows: how can I get it so that the code will be able to find all the wind speed values in the dataframe, windHW, of a specific hour be higher than the 5*sigma value at that hour? For example, I would like to find if any of the wind speed values at hour 1 are higher than 79.6 m/s, and if so, then replace that value with NA. I have something like this but I can't seem to figure out how to get it for specific hours: windHW$ws[windHW$ws=spring$sigma5] - NA I imported the data using readLines and into the dataframe windHW. I also have R version 3.1.1 Any help would be appreciated! Thanks, Alexandra __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on CCA and RDA analysis
Luis Fernando García luysgarcia at gmail.com writes: Dear R experts, I wanted to know if you can suggest me any website or tutorial just to learn about how to make a RDA or CDA in R Thanks in advance! I hate to ask, but did you try Googling canonical correspondence analysis R ... ? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on CCA and RDA analysis
Yeah, The most useful example I found was this. https://gist.github.com/perrygeo/7572735. I always had the idea of this kind of forums was to provide sources not so obvious in the web. If you have something better it would be great. 2015-04-10 18:36 GMT-03:00 Ben Bolker bbol...@gmail.com: Luis Fernando García luysgarcia at gmail.com writes: Dear R experts, I wanted to know if you can suggest me any website or tutorial just to learn about how to make a RDA or CDA in R Thanks in advance! I hate to ask, but did you try Googling canonical correspondence analysis R ... ? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding values in a dataframe at a specified hour
Hi Alexandra, I answered too quickly. Your response made me look for a deeper error: The value of i doesn't matter, as it isn't being used as an index. However, the first value of i=0 may cause the error in the second loop, where h is used as an index. for (i in 0:23){ hourRow = which(windHW$hour==i,arr.ind=TRUE) for (h in hourRow){ if (windHW[h+1,1]=spring[spring$hour==i,5]){ windHW[h+1,1]-NA} } } Jim On Sat, Apr 11, 2015 at 9:24 AM, Alexandra Catena amc5...@gmail.com wrote: Hi Jim, Thanks for the response, but unfortunately it results in the same error. I think it is something wrong with the if statement. I tried it out manually for the first row and hour that it's testing and indeed, the wind speed is not higher than the 5*sigma value. Since it is not higher than the 5*sigma value, I would think it would just pass to the next loop, yet it doesn't. I will keep trying! Thanks, Alexandra On Fri, Apr 10, 2015 at 3:43 PM, Jim Lemon drjimle...@gmail.com wrote: Hi Alexandra, The error probably comes from the first iteration of i in 0:23. As indexing in R begins at 1, there is no element 0. Try using: for(i in 1:24) { ... and see what happens. Jim On Sat, Apr 11, 2015 at 7:06 AM, Alexandra Catena amc5...@gmail.com wrote: Update: I have this so far. * The first column of windHW is the wind speed. The 5th column of the dataframe, spring, is the 5*sigma value of every hour. hourRow gives out all the rows of wind speed at a given hour. for (i in 0:23){ hourRow = which(windHW$hour==i,arr.ind=TRUE) for (h in hourRow){ if (windHW[h,1]=spring[spring$hour==i,5]){ windHW[h,1]-NA} } } This then gives the error: Error in if (windHW[h, 1] = spring[spring$hour == i, 5]) { : argument is of length zero *Note: The dataframe for each of the seasons have 24 rows corresponding to each hour of the day 0:23. Thanks, Alexandra On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena amc5...@gmail.com wrote: Hello, I have a large dataframe (windHW) of wind speeds (ws) at each hour from many days over a set of years. Some of these values are obviously wrong (600 m/s) and I want to get rid of all the values that are larger than 5*sigma for each hour. The 5*sigma (variable name sigma5) values are located in different dataframes for each season, with each dataframe titled as a season. For example, in the dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. So my question is as follows: how can I get it so that the code will be able to find all the wind speed values in the dataframe, windHW, of a specific hour be higher than the 5*sigma value at that hour? For example, I would like to find if any of the wind speed values at hour 1 are higher than 79.6 m/s, and if so, then replace that value with NA. I have something like this but I can't seem to figure out how to get it for specific hours: windHW$ws[windHW$ws=spring$sigma5] - NA I imported the data using readLines and into the dataframe windHW. I also have R version 3.1.1 Any help would be appreciated! Thanks, Alexandra __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.