Re: [R] same random numbers in different sessions
Dear all Thanks for all the pointers. On Sat, Oct 9, 2010 at 11:39 PM, Daniel Nordlund djnordl...@frontier.com wrote: Could you be reloading a workspace at start-up that is setting the seed? What happens if you start R using the --vanilla option? It seems that this is the culprit. For some reason, my $HOME session already has ls(all=T) [1] .Random.seed If I open a session in /tmp, or using --vanilla then I get actual pseudo-random numbers. It seems that require(IPSUR) is not at fault, either. Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] same random numbers in different sessions
Hello On Sun, Oct 10, 2010 at 1:16 AM, jim holtman jholt...@gmail.com wrote: You need to set the set.seed yourself. There are some simulation where I do want the same numbers generated and can use the set.seed to set it to a know value. If you want something random each time, then use the time of day in the call to set.seed. I try to do this, but I get funny results. I put try(rm(.Random.seed)) set.seed(Sys.time()) in /usr/lib/R/etc/Rprofile.site but I get the following error Error in rm(.Random.seed) : cannot remove variables from base namespace [Previously saved workspace restored] and it is as if the set.seed() call didn't work, since I get the same random values. rnorm(1:10) [1] -1.3618103 0.4241701 1.0720076 0.2208145 -0.5375314 -0.4846588 [7] 0.7576768 0.6527407 -0.6868786 0.8718527 If I do set.seed(Sys.time()) rnorm(1:10) [1] -0.6165650 0.6305187 -0.9316815 0.6034638 -0.8593514 -1.0243644 [7] -0.1050344 0.4408562 -0.3466161 0.4058430 manually, within the session, the seed seems to be changed as requested. Am I doing something wrong? Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel
Thanks David. You mentioned that I need to insert an extra cell and move the header over one position. I tried for a few times but still coudnt figure out how. Can you pls advise? Many thanks. On Sun, Oct 10, 2010 at 11:55 AM, David Winsemius dwinsem...@comcast.netwrote: On Oct 9, 2010, at 10:54 PM, missvanilla wrote: Dear all, I'm totally new to R. Recently I've been trying to use getYahooData in TTR package in order to download stock index daily open/high/low/close. The downloaded data is in the format of Open High Low Close Volume 2000-01-04 18937.45 19187.61 18937.45 19002.86 0 2000-01-05 19003.51 19003.51 18221.82 18542.55 0 2000-01-06 18574.01 18582.74 18168.27 18168.27 0 2000-01-07 18194.05 18285.73 18068.10 18193.41 0 2000-01-11 18246.10 18887.56 18246.10 18850.92 0 2000-01-12 18780.17 18811.87 18626.92 18677.42 0 2000-01-13 18667.18 18845.03 18667.18 18833.29 0 2000-01-14 18882.99 19058.02 18733.83 18956.55 0 2000-01-17 19025.62 19442.58 19025.62 19437.23 0 2000-01-18 19412.47 19412.47 19145.17 19196.57 0 However, when I attempted to write the data to excel using write.table, dates in the first colume become 1,2,3,4 in the excel file. Same problem happened if write.csv was used. If you run these two lines of code you'll get what I meant.. before running the code, package TTR needs to be loaded. N225 - getYahooData(^N225, 2101, ) write.table(N225,Nikkei.xls,sep='\t', row.name = TRUE , col.name = NA) There is a well-described problem with write.table files going into Excel. There is no leading item or tab on the first row. You need to insert an extra cell and move the header over one position. Then you won't be misinterpreting your row.names as dates. -- David Appreciate your kind assistance! Thanks a lot in advance. -- View this message in context: http://r.789695.n4.nabble.com/Help-needed-for-getYahooData-in-TTR-package-writing-the-Yahoo-data-to-excel-tp2970017p2970017.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel
Just trying to be clearer on the problem faced, the dates in the 1st colume become 1,2,3,4 in excel as below, the screenshot of how data appears in excel is attached. Open High Low Close Volume 1 18937.45 19187.61 18937.45 19002.86 0 2 19003.51 19003.51 18221.82 18542.55 0 3 18574.01 18582.74 18168.27 18168.27 0 4 18194.05 18285.73 18068.10 18193.41 0 5 18246.10 18887.56 18246.10 18850.92 0 . . On Sun, Oct 10, 2010 at 11:55 AM, David Winsemius dwinsem...@comcast.netwrote: On Oct 9, 2010, at 10:54 PM, missvanilla wrote: Dear all, I'm totally new to R. Recently I've been trying to use getYahooData in TTR package in order to download stock index daily open/high/low/close. The downloaded data is in the format of Open High Low Close Volume 2000-01-04 18937.45 19187.61 18937.45 19002.86 0 2000-01-05 19003.51 19003.51 18221.82 18542.55 0 2000-01-06 18574.01 18582.74 18168.27 18168.27 0 2000-01-07 18194.05 18285.73 18068.10 18193.41 0 2000-01-11 18246.10 18887.56 18246.10 18850.92 0 2000-01-12 18780.17 18811.87 18626.92 18677.42 0 2000-01-13 18667.18 18845.03 18667.18 18833.29 0 2000-01-14 18882.99 19058.02 18733.83 18956.55 0 2000-01-17 19025.62 19442.58 19025.62 19437.23 0 2000-01-18 19412.47 19412.47 19145.17 19196.57 0 However, when I attempted to write the data to excel using write.table, dates in the first colume become 1,2,3,4 in the excel file. Same problem happened if write.csv was used. If you run these two lines of code you'll get what I meant.. before running the code, package TTR needs to be loaded. N225 - getYahooData(^N225, 2101, ) write.table(N225,Nikkei.xls,sep='\t', row.name = TRUE , col.name = NA) There is a well-described problem with write.table files going into Excel. There is no leading item or tab on the first row. You need to insert an extra cell and move the header over one position. Then you won't be misinterpreting your row.names as dates. -- David Appreciate your kind assistance! Thanks a lot in advance. -- View this message in context: http://r.789695.n4.nabble.com/Help-needed-for-getYahooData-in-TTR-package-writing-the-Yahoo-data-to-excel-tp2970017p2970017.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] same random numbers in different sessions
On 10/10/2010 4:30 AM, Liviu Andronic wrote: Hello On Sun, Oct 10, 2010 at 1:16 AM, jim holtman jholt...@gmail.com wrote: You need to set the set.seed yourself. There are some simulation where I do want the same numbers generated and can use the set.seed to set it to a know value. If you want something random each time, then use the time of day in the call to set.seed. I try to do this, but I get funny results. I put try(rm(.Random.seed)) set.seed(Sys.time()) in /usr/lib/R/etc/Rprofile.site but I get the following error Error in rm(.Random.seed) : cannot remove variables from base namespace [Previously saved workspace restored] and it is as if the set.seed() call didn't work, since I get the same random values. rnorm(1:10) [1] -1.3618103 0.4241701 1.0720076 0.2208145 -0.5375314 -0.4846588 [7] 0.7576768 0.6527407 -0.6868786 0.8718527 If I do set.seed(Sys.time()) rnorm(1:10) [1] -0.6165650 0.6305187 -0.9316815 0.6034638 -0.8593514 -1.0243644 [7] -0.1050344 0.4408562 -0.3466161 0.4058430 manually, within the session, the seed seems to be changed as requested. Am I doing something wrong? The Rprofile.site is being executed before the saved workspace is restored. See ?Startup for the sequence of events on startup. You could put the rm() in .First() in the saved workspace and it would do what you want. But more generally, I would say the thing you are doing wrong is saving .RData sometimes, but not consistently saving it. In my opinion it's safest to never save it; then you won't recover unexpected things from your history. But it's also safe to always save it. Then you'll get a new copy of .Random.seed saved each time. I think the q() function makes it a little too easy to do what you did: if you intend to never save it, but answer Yes just once, you get into your situation. I don't know what the alternative should be. One possibility would be for R to record whether the workspace was restored at the start of the session, and use that to determine the default when ending it. But that would mess up people who are trying to reproduce things from identical conditions, e.g. when tracking down a bug. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GC verbose=false still showing report
On 09/10/2010 9:59 PM, Robin Jeffries wrote: invisible(gc()) worked perfectly. Thanks Jeff. @ Josh: I know how to toggle showing/hiding command echos, but I haven't figured out how to toggle on/off any printed output. Use results=hide as an Sweave option, e.g. echo=FALSE, results=hide= gc() @ Duncan Murdoch On Sat, Oct 9, 2010 at 5:10 PM, Robin Jeffries rjeffr...@ucla.edu wrote: I must be reading the help file for gc() wrong. I thought it said that gc(verbose=FALSE) will run the garbage collection without printing the Ncells/Vcells summary. However, this is what I get: gc(verbose = FALSE) used (Mb) gc trigger (Mb) max used (Mb) Ncells 267097 14.3 531268 28.4 531268 28.4 Vcells 429302 3.3 20829406 159.0 55923977 426.7 I'm embedding this in an Sweave/TeX file, so I *really* can't have this printing out. Suggestions other than manually editing the TeX file? Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-624-0428 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel
On Oct 10, 2010, at 5:27 AM, vanilla fantasy wrote: Just trying to be clearer on the problem faced, the dates in the 1st colume become 1,2,3,4 in excel as below, the screenshot of how data appears in excel is attached. I can affirm that there was at one time a jpg since you copied me an my mail client is much less suspicious about attachments than is the mainly lis server. Nobody else got a copy, though. Open High Low Close Volume 1 18937.45 19187.61 18937.45 19002.86 0 2 19003.51 19003.51 18221.82 18542.55 0 3 18574.01 18582.74 18168.27 18168.27 0 4 18194.05 18285.73 18068.10 18193.41 0 5 18246.10 18887.56 18246.10 18850.92 0 THat was really not a data.frame in R, but rahter an xts object and when write.table coverted it to a data.frame the dates (which were an attribute got stripped off.: str(N225) An ‘xts’ object from 2000-01-04 to 2010-10-08 containing: Data: num [1:2644, 1:5] 18937 19004 18574 18194 18246 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:5] Open High Low Close ... Indexed by objects of class: [POSIXt,POSIXct] TZ: xts Attributes: NULL str(as.data.frame(N225)) 'data.frame': 2644 obs. of 5 variables: $ Open : num 18937 19004 18574 18194 18246 ... $ High : num 19188 19004 18583 18286 1 ... $ Low : num 18937 18222 18168 18068 18246 ... $ Close : num 19003 18543 18168 18193 18851 ... $ Volume: num 0 0 0 0 0 0 0 0 0 0 ... You probably first need to extract the dates from that xts object names(attributes(N225)) [1] index dim dimnamesclass .indexCLASS [6] .indexTZ str(attr(N225, index)) num [1:2644] 9.47e+08 9.47e+08 9.47e+08 9.47e+08 9.48e+08 ... So the dates are not in a DateTime format, but experimentation shows them to be a series of 5:00:: if treated as POSIXct. dts - as.POSIXct(attributes(N225)$index, origin=1970-01-01) str(dts) POSIXct[1:2644], format: 2000-01-04 05:00:00 2000-01-05 05:00:00 ... You will first need to use as.data.frame to re-class the data matrix in N225 and then add a column of dates. Sorry for the initial off-base reply. I should have looked at the tab separated file you created rather than assuming I what would happen. Maybe all in one stroke: N225df - cbind(dts, as.data.frame(N225) ) str(N225df) 'data.frame': 2644 obs. of 6 variables: $ dts : POSIXct, format: 2000-01-04 05:00:00 2000-01-05 05:00:00 ... $ Open : num 18937 19004 18574 18194 18246 ... $ High : num 19188 19004 18583 18286 1 ... $ Low : num 18937 18222 18168 18068 18246 ... $ Close : num 19003 18543 18168 18193 18851 ... $ Volume: num 0 0 0 0 0 0 0 0 0 0 ... -- David. . . On Sun, Oct 10, 2010 at 11:55 AM, David Winsemius dwinsem...@comcast.net wrote: On Oct 9, 2010, at 10:54 PM, missvanilla wrote: Dear all, I'm totally new to R. Recently I've been trying to use getYahooData in TTR package in order to download stock index daily open/high/low/close. The downloaded data is in the format of Open High Low Close Volume 2000-01-04 18937.45 19187.61 18937.45 19002.86 0 2000-01-05 19003.51 19003.51 18221.82 18542.55 0 2000-01-06 18574.01 18582.74 18168.27 18168.27 0 2000-01-07 18194.05 18285.73 18068.10 18193.41 0 2000-01-11 18246.10 18887.56 18246.10 18850.92 0 2000-01-12 18780.17 18811.87 18626.92 18677.42 0 2000-01-13 18667.18 18845.03 18667.18 18833.29 0 2000-01-14 18882.99 19058.02 18733.83 18956.55 0 2000-01-17 19025.62 19442.58 19025.62 19437.23 0 2000-01-18 19412.47 19412.47 19145.17 19196.57 0 However, when I attempted to write the data to excel using write.table, dates in the first colume become 1,2,3,4 in the excel file. Same problem happened if write.csv was used. If you run these two lines of code you'll get what I meant.. before running the code, package TTR needs to be loaded. N225 - getYahooData(^N225, 2101, ) write.table(N225,Nikkei.xls,sep='\t', row.name = TRUE , col.name = NA) There is a well-described problem with write.table files going into Excel. There is no leading item or tab on the first row. You need to insert an extra cell and move the header over one position. Then you won't be misinterpreting your row.names as dates. -- David Appreciate your kind assistance! Thanks a lot in advance. -- View this message in context: http://r.789695.n4.nabble.com/Help-needed-for-getYahooData-in-TTR-package-writing-the-Yahoo-data-to-excel-tp2970017p2970017.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. nikkei screenshot.jpg
[R] trycatch examples
Dear R-group, I am looking for some good examples on trycatch. Any pointers? The help manual seems quite limited. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Package prabclus not available?
Hi there, I just tried to install the package prabclus on a computer running Ubuntu Linux 9.04 using install.packages from within R. This gave me a message: Warning message: In install.packages(prabclus) : package ‘prabclus’ is not available I tried to do this selecting two different CRAN mirrors (same result) and with other packages (installing them works fine). Looking up the CRAN mirror website I used (UK, London), there doesn't seem to be anything wrong with prabclus. (iMac checking apparently gives an error which is due to an error with package spdep on that platform in tests, but that shouldn't affect using it on Linux, or should it?) Any explanation? Thanks and best wishes, Christian *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parallel processing
1.what is the application to install for to speed up processing for multicore processor in windows environment? 2. how to compute time for executing a particular a code? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parallel processing
Hello Partha, Both questions are answered here: http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/ http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/I would also recommend you to have a look here: http://www.r-statistics.com/2010/09/using-the-plyr-1-2-package-parallel-processing-backend-with-windows/ There are claims for other packages to achieve this, but I wasn't able to make them work (I'd be glad to hear of better results by others) Best, Tal http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/ Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Oct 10, 2010 at 2:58 PM, Partha Sinha pnsinh...@gmail.com wrote: 1.what is the application to install for to speed up processing for multicore processor in windows environment? 2. how to compute time for executing a particular a code? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel
On Sun, Oct 10, 2010 at 8:09 AM, David Winsemius dwinsem...@comcast.net wrote: On Oct 10, 2010, at 5:27 AM, vanilla fantasy wrote: Just trying to be clearer on the problem faced, the dates in the 1st colume become 1,2,3,4 in excel as below, the screenshot of how data appears in excel is attached. I can affirm that there was at one time a jpg since you copied me an my mail client is much less suspicious about attachments than is the mainly lis server. Nobody else got a copy, though. Open High Low Close Volume 1 18937.45 19187.61 18937.45 19002.86 0 2 19003.51 19003.51 18221.82 18542.55 0 3 18574.01 18582.74 18168.27 18168.27 0 4 18194.05 18285.73 18068.10 18193.41 0 5 18246.10 18887.56 18246.10 18850.92 0 THat was really not a data.frame in R, but rahter an xts object and when write.table coverted it to a data.frame the dates (which were an attribute got stripped off.: write.zoo in the zoo package can write xts objects. See ?write.zoo: write.zoo(N225, file = myfile.dat, ...possibly other arguments...) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? Indeed that is the way I should go for and I have installed the package after some struggling. Since biostring is a fairly complex package and I need only a way to check if a certain string A is a subset of string B, do you know the biostring functions to achieve this? I see a lot of methods for biological (DNA, RNA) sequences, and they may not apply to my series (which are definitely not from biology). Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hausman test for endogeneity
Dear Liviu, thank you very much. After inspecting the options, I *guess* that systemfit is what I need. However, I absolutely don't understand how it works. I searched long for a detailed documentation (beyond the rather cryptic standard documentation) but found none. Has anybody references/advises how to conduct the test? Best, Holger -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-for-endogeneity-tp2969522p2970261.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: rulefit error on Linux
On 07.10.2010 10:55, noclue_ wrote: R version 2.8.1 (2008-12-22) on Linux 64-bit I am trying to run 'rulefit' function (Rule based Learning Ensembles). but I got the following error - rulefit(x,y) Warning: This program is an suid-root program or is being run by the root user. The full text of the error or warning message cannot be safely formatted in this environment. You may get a more descriptive message by running the program as a non-root user or by removing the suid bit on the executable. xterm Xt error: Can't open display: %s xterm: DISPLAY is not set Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : cannot open file '/root/_rulefit/rfstatus': No such file or directory -- On windows R 2.10, I got this run successfully. So I am wondering whether it is due to my R older version on Linux. Well, may be, both R versions are old and the release candidate for R-2.12.0 is out, hence please use that one for now and witch to the release version next week. For your original question: Looks like you had some terminal without X forwarding, i.e. R was not able to open an x11 window in ordert to present a plot. Uwe Ligges Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matching long strings ... was Re: Memory management in R
On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote: I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? Indeed that is the way I should go for and I have installed the package after some struggling. For me is was a matter of waiting. The only struggle was coming from my inner timer saying it was taking too long. Since biostring is a fairly complex package and I need only a way to check if a certain string A is a subset of string B, do you know the biostring functions to achieve this? I see a lot of methods for biological (DNA, RNA) sequences, and they may not apply to my series (which are definitely not from biology). Cheers It appeared to me that the function matchPattern should replace your grepl invocation that was failing. It returns a more complex structure, so you would need to determine what would be an exact replacement for grepl(...) != 1. Looks like a no-match event resutls in the start and end items being of length 0. str( matchPattern(A, BString(BBB)) ) Formal class 'XStringViews' [package Biostrings] with 7 slots ..@ subject:Formal class 'BString' [package Biostrings] with 6 slots .. .. ..@ shared :Formal class 'SharedRaw' [package IRanges] with 2 slots .. .. .. .. ..@ xp:externalptr .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8 .. .. ..@ offset : int 0 .. .. ..@ length : int 3 .. .. ..@ elementMetadata: NULL .. .. ..@ elementType: chr ANY .. .. ..@ metadata : list() ..@ start : int(0) ..@ width : int(0) ..@ NAMES : NULL ..@ elementMetadata: NULL ..@ elementType: chr integer ..@ metadata : list() Perhaps: length(matchPattern(fut_string, past_string)@start ) == 0 You do need to use BString() on at least the past_string argument and maybe the fut_string as well. The BioConductor Mailing List would have a larger audience with experience using this package, so they should probably be your next avenue for advice. I am just reading the help pages as you should be able to do. The help page help(lowlevel- matching) should probably be reviewed since there may be efficiency issues to consider as mentioned below. When dropped into your function with the BString coercion, it replicated your small example results and did not crash after a long period with your larger example, so I then terminated it and insert a reporter line to monitor progress. With that reporter I got up into the 200's for count_len without error. My laptop CPU was warming up the case and I was getting sleepy so I terminated the process. (I had no way of checking for accuracy, even if I had let it proceed, since you did not offer a correct answer.) By the way, the construct ... grepl(. , .) != 1 ... is perhaps inefficient. It could more compactly be expressed as ... ! grepl(. , .) which would not be doing coercion of logicals to integers. -- David. Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hausman test for endogeneity
Hi Holger On 10 October 2010 15:36, Holger Steinmetz holger.steinm...@web.de wrote: After inspecting the options, I *guess* that systemfit is what I need. However, I absolutely don't understand how it works. I searched long for a detailed documentation (beyond the rather cryptic standard documentation) but found none. Has anybody references/advises how to conduct the test? A paper describing the systemfit package has been published in the journal of statistical software: http://www.jstatsoft.org/v23/i04/paper It describes the Hausman test for testing the consistency of the 3SLS estimates against the 2SLS estimates (see sections 2.8 and 4.6). I guess (but I am not sure -- maybe others can comment on this) that you test for the endogeneity of regressors, e.g., by fitSur - systemfit( myFormula, data = myData, method = SUR ) fit3sls - systemfit( myFormula, data = myData, method = 3SLS, inst = myInst ) hausman.systemfit( fit3sls, fitSur ) If some regressors are endogenous, the SUR estimates are inconsistent but the 3SLS estimates are consistent given that the instrumental variables are exogenous. However, if all regressors are exogenous, both estimates should be consistent but the SUR estimates should be more efficient. Best wishes, Arne -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package prabclus not available?
Works for me for both the London mirror as well as CRAN master. May I guess that you are under R 2.10.x on that machine? Uwe On 10.10.2010 14:48, Christian Hennig wrote: Hi there, I just tried to install the package prabclus on a computer running Ubuntu Linux 9.04 using install.packages from within R. This gave me a message: Warning message: In install.packages(prabclus) : package ‘prabclus’ is not available I tried to do this selecting two different CRAN mirrors (same result) and with other packages (installing them works fine). Looking up the CRAN mirror website I used (UK, London), there doesn't seem to be anything wrong with prabclus. (iMac checking apparently gives an error which is due to an error with package spdep on that platform in tests, but that shouldn't affect using it on Linux, or should it?) Any explanation? Thanks and best wishes, Christian *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package prabclus not available?
Works for me for both the London mirror as well as CRAN master. May I guess that you are under R 2.10.x on that machine? Oh yes, need to update this first. Thanks. Solved. The error message could be more informative, though... Christian Uwe On 10.10.2010 14:48, Christian Hennig wrote: Hi there, I just tried to install the package prabclus on a computer running Ubuntu Linux 9.04 using install.packages from within R. This gave me a message: Warning message: In install.packages(prabclus) : package ‘prabclus’ is not available I tried to do this selecting two different CRAN mirrors (same result) and with other packages (installing them works fine). Looking up the CRAN mirror website I used (UK, London), there doesn't seem to be anything wrong with prabclus. (iMac checking apparently gives an error which is due to an error with package spdep on that platform in tests, but that shouldn't affect using it on Linux, or should it?) Any explanation? Thanks and best wishes, Christian *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching long strings ... was Re: Memory management in R
On 10/10/2010 04:11 PM, David Winsemius wrote: length(matchPattern(fut_string, past_string)@start ) == 0 Wow, thanks a lot! I am still testing this, but it looks like this is a good replacement for grepl. Definitely, since I am not a life scientist even from afar by training, this solution/analogy with sequencing in biology would have never come to my mind. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] segfault caused by `icfit` in `interval` package
Dear R community, I am using the R package `interval` in order to perform some modelling tests of the NPMLE convergence in the case of censoring. So all I am doing is drawing a sample from exponential distribution, making it a censored sample and computing the NPMLE of its distribution function. But when run on Linux Calculate 10.4 the program keeps crashing and reporting a segmentation fault after the call to the `icfit` function when the sample size gets to 70. When run on Windows 7 it seems to be fine. That is why I am totally confused and have decided to ask for help. I have attached the code I am running which results in a segmentation fault if run on Linux Calculate. It has the seed set to the value which leads to this error. But it is important to note that if the parameters used in the program and the seed are changed it doesn't necessarily crash. Here is the description of my R version and OS: sessionInfo() R version 2.10.1 (2009-12-14) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base After calling `icfit` the program quits with the following output (I have replaced the output concerning the arguments passed to initcomputeMLE by arguments passed to initcomputeMLE so that the description of the output wouldn't be too long): *** caught segfault *** address 0xc, cause 'memory not mapped' Traceback: 1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol) 2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol = tol) 3: initcomputeMLE( arguments passed to initcomputeMLE) 4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A)) 5: doTryCatch(return(expr), name, parentenv, handler) 6: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 7: tryCatchList(expr, classes, parentenv, handlers) 8: tryCatch(expr, error = function(e) {call - conditionCall(e)if (!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch))) call - sys.call(-4L)dcall - deparse(call)[1L]prefix - paste(Error in, dcall, : ) LONG - 75Lmsg - conditionMessage(e)sm - strsplit(msg, \n)[[1L]]w - 14L + nchar(dcall, type = w) + nchar(sm[1L], type = w)if (is.na(w)) w - 14L + nchar(dcall, type = b) + nchar(sm[1L], type = b)if (w LONG) prefix - paste(prefix, \n , sep = )}else prefix - Error : msg - paste(prefix, conditionMessage(e), \n, sep = ) .Internal(seterrmessage(msg[1L]))if (!silent identical(getOption(show.error.messages), TRUE)) {cat(msg, file = stderr()).Internal(printDeferredWarnings())} invisible(structure(msg, class = try-error))}) 9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A))) 10: icfit.default(L = left, R = right) 11: icfit(L = left, R = right) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace I would greatly appreciate any help provided. Sincerely yours, Yuliya Matveyeva. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] segfault caused by `icfit` in `interval` package
Please try with the R-2.12.0 ReleaseCandidate and thwe most recent version of the package (please report its version number!). If it still fails, it may be a good idea to contact the package maintainer of interval first. Uwe Ligges On 10.10.2010 17:33, Yuliya Matveyeva wrote: Dear R community, I am using the R package `interval` in order to perform some modelling tests of the NPMLE convergence in the case of censoring. So all I am doing is drawing a sample from exponential distribution, making it a censored sample and computing the NPMLE of its distribution function. But when run on Linux Calculate 10.4 the program keeps crashing and reporting a segmentation fault after the call to the `icfit` function when the sample size gets to 70. When run on Windows 7 it seems to be fine. That is why I am totally confused and have decided to ask for help. I have attached the code I am running which results in a segmentation fault if run on Linux Calculate. It has the seed set to the value which leads to this error. But it is important to note that if the parameters used in the program and the seed are changed it doesn't necessarily crash. Here is the description of my R version and OS: sessionInfo() R version 2.10.1 (2009-12-14) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base After calling `icfit` the program quits with the following output (I have replaced the output concerning the arguments passed to initcomputeMLE by arguments passed to initcomputeMLE so that the description of the output wouldn't be too long): *** caught segfault *** address 0xc, cause 'memory not mapped' Traceback: 1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol) 2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol = tol) 3: initcomputeMLE( arguments passed to initcomputeMLE) 4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A)) 5: doTryCatch(return(expr), name, parentenv, handler) 6: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 7: tryCatchList(expr, classes, parentenv, handlers) 8: tryCatch(expr, error = function(e) {call- conditionCall(e)if (!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch))) call- sys.call(-4L)dcall- deparse(call)[1L]prefix- paste(Error in, dcall, : ) LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg, \n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type = w)if (is.na(w)) w- 14L + nchar(dcall, type = b) + nchar(sm[1L], type = b)if (w LONG) prefix- paste(prefix, \n , sep = )}else prefix- Error : msg- paste(prefix, conditionMessage(e), \n, sep = ) .Internal(seterrmessage(msg[1L]))if (!silent identical(getOption(show.error.messages), TRUE)) {cat(msg, file = stderr()).Internal(printDeferredWarnings())} invisible(structure(msg, class = try-error))}) 9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A))) 10: icfit.default(L = left, R = right) 11: icfit(L = left, R = right) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace I would greatly appreciate any help provided. Sincerely yours, Yuliya Matveyeva. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching long strings ... was Re: Memory management in R
On 10/10/2010 07:11 AM, David Winsemius wrote: On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote: I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? Indeed that is the way I should go for and I have installed the package after some struggling. For me is was a matter of waiting. The only struggle was coming from my inner timer saying it was taking too long. Since biostring is a fairly complex package and I need only a way to check if a certain string A is a subset of string B, do you know the biostring functions to achieve this? I see a lot of methods for biological (DNA, RNA) sequences, and they may not apply to my series (which are definitely not from biology). Cheers It appeared to me that the function matchPattern should replace your grepl invocation that was failing. It returns a more complex structure, so you would need to determine what would be an exact replacement for grepl(...) != 1. Looks like a no-match event resutls in the start and end items being of length 0. str( matchPattern(A, BString(BBB)) ) A couple of things from this thread. To install a Bioconductor package follow directions here http://bioconductor.org/install/index.html#install-bioconductor-packages which leads to source(http://bioconductor.org/biocLite.R;) biocLite(Biostrings) biocLite is just a wrapper around install.packages with appropriate repositories defined. Some Bioconductor packages are relatively mature and make relatively advanced use of S4 classes, so looking at str() is not that helpful -- the way the user is meant to interact with the object is different from the way the object is implemented. So the best bet is to look at the relevant help pages result = matchPattern(A, BString(BBB)) class(result) class?XStringViews and the help pages referenced there, or from which XStringViews inherits class(XStringViews) and in particular class?Ranges Rather than accessing the 'start' slot, use start(result). Vignettes are used heavily in Bioconductor packages, and in particular browseVignettes(Biostrings) pops up a page with several relevant vignettes, e.g., 'A short presentation of the basic classes...' and perhaps 'Pairwise Sequence Alignment'. These are also accessible on the Bioconductor web site, e.g., on the pages linked from http://bioconductor.org/help/bioc-views/release/bioc/ The rule of thumb hinted at below -- that an operation seems to be taking longer than it should -- probably indicates that the function is being invoked in an inefficient way. If the documentation is opaque then definitely the place to seek additional help is on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ Hope this helps. Martin Formal class 'XStringViews' [package Biostrings] with 7 slots ..@ subject:Formal class 'BString' [package Biostrings] with 6 slots .. .. ..@ shared :Formal class 'SharedRaw' [package IRanges] with 2 slots .. .. .. .. ..@ xp:externalptr .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8 .. .. ..@ offset : int 0 .. .. ..@ length : int 3 .. .. ..@ elementMetadata: NULL .. .. ..@ elementType: chr ANY .. .. ..@ metadata : list() ..@ start : int(0) ..@ width : int(0) ..@ NAMES : NULL ..@ elementMetadata: NULL ..@ elementType: chr integer ..@ metadata : list() Perhaps: length(matchPattern(fut_string, past_string)@start ) == 0 You do need to use BString() on at least the past_string argument and maybe the fut_string as well. The BioConductor Mailing List would have a larger audience with experience using this package, so they should probably be your next avenue for advice. I am just reading the help pages as you should be able to do. The help page help(lowlevel-matching) should probably be reviewed since there may be efficiency issues to consider as mentioned below. When dropped into your function with the BString coercion, it replicated your small example results and did not crash after a long period with your larger example, so I then terminated it and insert a reporter line to monitor progress. With that reporter I got up into the 200's for count_len without error. My laptop CPU was warming up the case and I was getting sleepy so I terminated the process. (I had no way of checking for accuracy, even if I had let it proceed, since you did not offer a correct answer.) By the way, the construct ... grepl(. , .) != 1 ... is perhaps inefficient. It could more compactly be expressed as ... !grepl(. , .) which would not be doing coercion of logicals to integers. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
[R] Help reading table rows into lists
Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY- list (COG0001 = c (patha ,pathb ,pathc ),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh)) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Thanks, Alison Here is my latest attempt. # line_num-length(scan(file=/g/bork8/waller/ test_COGtoPath.txt,what=character,sep=\n)) COG2Path-vector(list,line_num) COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/ test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) # I am getting an error # COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/ waller/ test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : But if I do scan alone I don't get an error # then I suppose it looks like the easiest wasy to name the list variables is using unix to cut the first column out and then read that in. names(COG2Path)-scan(file=/g/bork8/waller/ test_col_names.txt,sep=\t,what=character) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
Date: Sun, 10 Oct 2010 15:27:11 +0200 From: lorenzo.ise...@gmail.com To: dwinsem...@comcast.net CC: r-help@r-project.org Subject: Re: [R] Memory management in R I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? Indeed that is the way I should go for and I have installed the package after some struggling. Since biostring is a fairly complex package and I need only a way to check if a certain string A is a subset of string B, do you know the biostring functions to achieve this? I see a lot of methods for biological (DNA, RNA) sequences, and they may not apply to my series (which are definitely not from biology). Generally the differences relate to alphabet and things you may want to know about them. Unless you are looking for reverse complement text strings, there will be a lot of stuff you don't need. Offhand, I'd be looking for things like computational linguistics packages as you are looking to find patterns or predictability in human readable character sequences. Now, humans can probably write hairpin-text( look at what RNA can do LOL) but this is probably not what you care about. However, as I mentioned earlier, I had to write my own regex compiler ( coincidently for bio apps ) to get required performance. Your application and understanding may benefit from things like building dictionaries that aren't really part of regex and that can easily be done in a few lines of c++ code using STL containers. To get statistically meaningful samples, you almost will certainly need faster code. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create single vector after looping through multiple data frames with GREP
Hello all, I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below. I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming with control structures. The reproducible code below creates a list containing 19 data frames, one each for the Most Important Problem survey data for Canada. What I'd like at this stage is a loop where I can search through all the data frames for rows containing the search term and then bind the rows together in a plotable (sp?) format. At the bottom of the code below, you'll find my first attempt to make use of a search string and to put it into a plotable format. It only partially works. I can only get the numbers for one year, where I'd like to be able to get a string of numbers for several years.But, on the upside, grep appears to do the trick in terms of selecting rows. Can any one suggest a solution? Yours truly, Simon Kiss #This is the reproducible code to set-up all the data frames require(XML) library(XML) #This gets the data from the web and lists them mylist - paste (http://www.queensu.ca/cora/_trends/mip_;, c(1987:2001,2003:2006), .htm, sep=) alltables - lapply(mylist, readHTMLTable) #convert to dataframes r-lapply(alltables, function(x) {as.data.frame(x)} ) #This is just some house-cleaning; structuring all the tables so they are uniform r[[1]][3]-r[[1]][2] r[[1]][2]-c( ) r[[2]][4]-r[[2]][2] r[[2]][5]-r[[2]][3] r[[2]][2:3]-c( ) r[[3]][4:5]-r[[3]][3:4] r[[3]][3]-c( ) #This loop deletes some superfluous columns and rows, turns the first column in to character strings and the data into numeric for (i in 1:19) { n.rows-dim(r[[i]])[1] r[[i]] - r[[i]][15:n.rows-3, 1:5] n.rows-dim(r[[i]])[1] row.names(r[[i]]) -NULL names(r[[i]]) - c(Response, Q1, Q2, Q3, Q4) r[[i]][, 1]-as.character(r[[i]][,1]) #r[[i]][,2:5]-as.numeric(as.character(r[[i]][,2:5])) r[[i]][, 2:5]-lapply(r[[i]][, 2:5], function(x) {as.numeric(as.character(x))}) #n.rows-dim(r[[i]])[1] #r[[i]]-r[[i]][9 } #This code is my first attempt at introducing a search string, getting the rows, binding and plotting; economy-r[[10]][grep('Economy', r[[10]][,1]),] economy_2-r[[11]][grep('Economy', r[[11]][,1]),] test-cbind(economy, economy_2) plot(as.numeric(test), type='l') #here's another attempt I'm trying economy-data.frame for (i in 15:19) { economy[i,] -r[[i]][grep('Economy', r[[i]][,1]), ] } Begin forwarded message: From: Simon Kiss sjk...@gmail.com Date: October 7, 2010 4:59:46 PM EDT To: Simon Kiss simonjk...@yahoo.ca Subject: Fwd: [R] Converting scraped data Begin forwarded message: From: Ethan Brown ethancbr...@gmail.com Date: October 6, 2010 4:22:41 PM GMT-04:00 To: Simon Kiss sjk...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] Converting scraped data Hi Simon, You'll notice the test data.frame has a whole mix of characters in the columns you're interested, including a - for missing values, and that the columns you're interested in are in fact factors. as.numeric(factor) returns the level of the factor, not the value of the level. (See ?levels and ?factor)--that's why it's giving you those irrelevant integers. I always end up using something like this handy code snippet to deal with the situation: unfactor - function(factors) # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor # Transform a factor back into its factor names { return(levels(factors)[factors]) } Then, to get your data to where you want it, I'd do this: require(XML) theurl - http://www.queensu.ca/cora/_trends/mip_2006.htm; tables - readHTMLTable(theurl) n.rows - unlist(lapply(tables, function(t) dim(t)[1])) class(tables) test-data.frame(tables, stringsAsFactors=FALSE) result - test[11:42, 1:5] #Extract the actual data we want names(result) - c(Response, Q1, Q2,Q3,Q4) for(i in 2:5) { # Convert columns to factors result[,i] - as.numeric(unfactor(result[,i])) } result From here you should be able to plot or do whatever else you want. Hope this helps, Ethan Brown On Wed, Oct 6, 2010 at 9:52 AM, Simon Kiss sjk...@gmail.com wrote: Dear Colleagues, I used this code to scrape data from the URL conatined within. This code should be reproducible. require(XML) library(XML) theurl - http://www.queensu.ca/cora/_trends/mip_2006.htm; tables - readHTMLTable(theurl) n.rows - unlist(lapply(tables, function(t) dim(t)[1])) class(tables) test-data.frame(tables, stringsAsFactors=FALSE) test[16,c(2:5)] as.numeric(test[16,c(2:5)]) quartz() plot(c(1:4), test[15, c(2:5)]) calling the values from the row of interest using test[16, c(2:5)] can bring them up as represented on the screen, plotting them or coercing them to numeric changes the values and in a way that doesn't make
Re: [R] segfault caused by `icfit` in `interval` package
On the main site I have only found R-2.11.1.tar.gzhttp://cran.gis-lab.info/src/base/R-2/R-2.11.1.tar.gzto be the latest release (the latest stable release as far as I understand it). But unfortunately it doesn't pass the `make check` on my system (that is probably the reason why the `emerge` command keeps telling me that my R version is up-to-date). May be I should post a separate message about this fact, but I am guessing I shouldn't because making a new release suitable for all OS's is probably just a matter of time. But before I write to the package maintainer directly could you please tell if there might be a non-package-specific reason for a segfault in my case ? Sincerely yours, Yuliya Matveyeva. 2010/10/10 Uwe Ligges lig...@statistik.tu-dortmund.de Please try with the R-2.12.0 ReleaseCandidate and thwe most recent version of the package (please report its version number!). If it still fails, it may be a good idea to contact the package maintainer of interval first. Uwe Ligges On 10.10.2010 17:33, Yuliya Matveyeva wrote: Dear R community, I am using the R package `interval` in order to perform some modelling tests of the NPMLE convergence in the case of censoring. So all I am doing is drawing a sample from exponential distribution, making it a censored sample and computing the NPMLE of its distribution function. But when run on Linux Calculate 10.4 the program keeps crashing and reporting a segmentation fault after the call to the `icfit` function when the sample size gets to 70. When run on Windows 7 it seems to be fine. That is why I am totally confused and have decided to ask for help. I have attached the code I am running which results in a segmentation fault if run on Linux Calculate. It has the seed set to the value which leads to this error. But it is important to note that if the parameters used in the program and the seed are changed it doesn't necessarily crash. Here is the description of my R version and OS: sessionInfo() R version 2.10.1 (2009-12-14) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base After calling `icfit` the program quits with the following output (I have replaced the output concerning the arguments passed to initcomputeMLE by arguments passed to initcomputeMLE so that the description of the output wouldn't be too long): *** caught segfault *** address 0xc, cause 'memory not mapped' Traceback: 1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol) 2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol = tol) 3: initcomputeMLE( arguments passed to initcomputeMLE) 4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A)) 5: doTryCatch(return(expr), name, parentenv, handler) 6: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 7: tryCatchList(expr, classes, parentenv, handlers) 8: tryCatch(expr, error = function(e) {call- conditionCall(e)if (!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch))) call- sys.call(-4L)dcall- deparse(call)[1L]prefix- paste(Error in, dcall, : ) LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg, \n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type = w)if (is.na(w)) w- 14L + nchar(dcall, type = b) + nchar(sm[1L], type = b)if (w LONG) prefix- paste(prefix, \n , sep = )}else prefix- Error : msg- paste(prefix, conditionMessage(e), \n, sep = ) .Internal(seterrmessage(msg[1L]))if (!silent identical(getOption(show.error.messages), TRUE)) { cat(msg, file = stderr()).Internal(printDeferredWarnings())} invisible(structure(msg, class = try-error))}) 9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A))) 10: icfit.default(L = left, R = right) 11: icfit(L = left, R = right) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace I would greatly appreciate any help provided. Sincerely yours, Yuliya Matveyeva. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Re: [R] segfault caused by `icfit` in `interval` package
The package version is Version: 1.0-1.0 as reported by the packageDescription(interval). 2010/10/10 Uwe Ligges lig...@statistik.tu-dortmund.de Please try with the R-2.12.0 ReleaseCandidate and thwe most recent version of the package (please report its version number!). If it still fails, it may be a good idea to contact the package maintainer of interval first. Uwe Ligges On 10.10.2010 17:33, Yuliya Matveyeva wrote: Dear R community, I am using the R package `interval` in order to perform some modelling tests of the NPMLE convergence in the case of censoring. So all I am doing is drawing a sample from exponential distribution, making it a censored sample and computing the NPMLE of its distribution function. But when run on Linux Calculate 10.4 the program keeps crashing and reporting a segmentation fault after the call to the `icfit` function when the sample size gets to 70. When run on Windows 7 it seems to be fine. That is why I am totally confused and have decided to ask for help. I have attached the code I am running which results in a segmentation fault if run on Linux Calculate. It has the seed set to the value which leads to this error. But it is important to note that if the parameters used in the program and the seed are changed it doesn't necessarily crash. Here is the description of my R version and OS: sessionInfo() R version 2.10.1 (2009-12-14) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base After calling `icfit` the program quits with the following output (I have replaced the output concerning the arguments passed to initcomputeMLE by arguments passed to initcomputeMLE so that the description of the output wouldn't be too long): *** caught segfault *** address 0xc, cause 'memory not mapped' Traceback: 1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol) 2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol = tol) 3: initcomputeMLE( arguments passed to initcomputeMLE) 4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A)) 5: doTryCatch(return(expr), name, parentenv, handler) 6: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 7: tryCatchList(expr, classes, parentenv, handlers) 8: tryCatch(expr, error = function(e) {call- conditionCall(e)if (!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch))) call- sys.call(-4L)dcall- deparse(call)[1L]prefix- paste(Error in, dcall, : ) LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg, \n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type = w)if (is.na(w)) w- 14L + nchar(dcall, type = b) + nchar(sm[1L], type = b)if (w LONG) prefix- paste(prefix, \n , sep = )}else prefix- Error : msg- paste(prefix, conditionMessage(e), \n, sep = ) .Internal(seterrmessage(msg[1L]))if (!silent identical(getOption(show.error.messages), TRUE)) { cat(msg, file = stderr()).Internal(printDeferredWarnings())} invisible(structure(msg, class = try-error))}) 9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A))) 10: icfit.default(L = left, R = right) 11: icfit(L = left, R = right) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace I would greatly appreciate any help provided. Sincerely yours, Yuliya Matveyeva. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] segfault caused by `icfit` in `interval` package
On 10.10.2010 19:10, Yuliya Matveyeva wrote: On the main site I have only found R-2.11.1.tar.gzhttp://cran.gis-lab.info/src/base/R-2/R-2.11.1.tar.gzto be the latest release (the latest stable release as far as I understand it). But unfortunately it doesn't pass the `make check` on my system (that is probably the reason why the `emerge` command keeps telling me that my R version is up-to-date). May be I should post a separate message about this fact, but I am guessing I shouldn't because making a new release suitable for all OS's is probably just a matter of time. But before I write to the package maintainer directly could you please tell if there might be a non-package-specific reason for a segfault in my case ? Probably it is the package, but we would need a reproducible example to check it. Uwe Ligges Sincerely yours, Yuliya Matveyeva. 2010/10/10 Uwe Liggeslig...@statistik.tu-dortmund.de Please try with the R-2.12.0 ReleaseCandidate and thwe most recent version of the package (please report its version number!). If it still fails, it may be a good idea to contact the package maintainer of interval first. Uwe Ligges On 10.10.2010 17:33, Yuliya Matveyeva wrote: Dear R community, I am using the R package `interval` in order to perform some modelling tests of the NPMLE convergence in the case of censoring. So all I am doing is drawing a sample from exponential distribution, making it a censored sample and computing the NPMLE of its distribution function. But when run on Linux Calculate 10.4 the program keeps crashing and reporting a segmentation fault after the call to the `icfit` function when the sample size gets to 70. When run on Windows 7 it seems to be fine. That is why I am totally confused and have decided to ask for help. I have attached the code I am running which results in a segmentation fault if run on Linux Calculate. It has the seed set to the value which leads to this error. But it is important to note that if the parameters used in the program and the seed are changed it doesn't necessarily crash. Here is the description of my R version and OS: sessionInfo() R version 2.10.1 (2009-12-14) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base After calling `icfit` the program quits with the following output (I have replaced the output concerning the arguments passed to initcomputeMLE by arguments passed to initcomputeMLE so that the description of the output wouldn't be too long): *** caught segfault *** address 0xc, cause 'memory not mapped' Traceback: 1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol) 2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol = tol) 3: initcomputeMLE( arguments passed to initcomputeMLE) 4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A)) 5: doTryCatch(return(expr), name, parentenv, handler) 6: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 7: tryCatchList(expr, classes, parentenv, handlers) 8: tryCatch(expr, error = function(e) {call- conditionCall(e)if (!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch))) call- sys.call(-4L)dcall- deparse(call)[1L]prefix- paste(Error in, dcall, : ) LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg, \n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type = w)if (is.na(w)) w- 14L + nchar(dcall, type = b) + nchar(sm[1L], type = b)if (w LONG) prefix- paste(prefix, \n , sep = )}else prefix- Error : msg- paste(prefix, conditionMessage(e), \n, sep = ) .Internal(seterrmessage(msg[1L]))if (!silent identical(getOption(show.error.messages), TRUE)) { cat(msg, file = stderr()).Internal(printDeferredWarnings())} invisible(structure(msg, class = try-error))}) 9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A = A))) 10: icfit.default(L = left, R = right) 11: icfit(L = left, R = right) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace I would greatly appreciate any help provided. Sincerely yours, Yuliya Matveyeva. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __
Re: [R] Help reading table rows into lists
On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote: Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh)) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Try this: Lines - COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh DF - read.table(textConnection(Lines), header = FALSE, fill = TRUE, as.is = TRUE, na.strings = ) library(reshape2) m - na.omit(melt(DF, 1)) result - unstack(m, value ~ V1) giving result $COG0001 [1] patha pathb pathc $COG0002 [1] pathd pathe $COG0003 [1] pathe pathf pathg pathh or acast(DF, value ~ V1) COG0001 COG0002 COG0003 patha patha NANA pathb pathb NANA pathc pathc NANA pathd NApathd NA pathe NApathe pathe pathf NANApathf pathg NANApathg pathh NANApathh Levels: patha pathb pathc pathd pathe pathf pathg pathh -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching long strings ... was Re: Memory management in R
On Oct 10, 2010, at 11:35 AM, Martin Morgan wrote: On 10/10/2010 07:11 AM, David Winsemius wrote: On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote: I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? Indeed that is the way I should go for and I have installed the package after some struggling. For me is was a matter of waiting. The only struggle was coming from my inner timer saying it was taking too long. Since biostring is a fairly complex package and I need only a way to check if a certain string A is a subset of string B, do you know the biostring functions to achieve this? I see a lot of methods for biological (DNA, RNA) sequences, and they may not apply to my series (which are definitely not from biology). Cheers It appeared to me that the function matchPattern should replace your grepl invocation that was failing. It returns a more complex structure, so you would need to determine what would be an exact replacement for grepl(...) != 1. Looks like a no-match event resutls in the start and end items being of length 0. str( matchPattern(A, BString(BBB)) ) A couple of things from this thread. To install a Bioconductor package follow directions here http://bioconductor.org/install/index.html#install-bioconductor-packages which leads to source(http://bioconductor.org/biocLite.R;) biocLite(Biostrings) biocLite is just a wrapper around install.packages with appropriate repositories defined. Some Bioconductor packages are relatively mature and make relatively advanced use of S4 classes, so looking at str() is not that helpful -- the way the user is meant to interact with the object is different from the way the object is implemented. So the best bet is to look at the relevant help pages result = matchPattern(A, BString(BBB)) class(result) class?XStringViews The above was the most surprising example for me (not being particularly S4-savvy). Looks like it parses as: `?`(class, XStringViews) Is that an S4 sort of extension for accessing documentation or have I just missed a more general method? I tried looking at the help Index for the methods package. and the help pages referenced there, or from which XStringViews inherits class(XStringViews) and in particular class?Ranges Rather than accessing the 'start' slot, use start(result). Vignettes are used heavily in Bioconductor packages, and in particular browseVignettes(Biostrings) pops up a page with several relevant vignettes, e.g., 'A short presentation of the basic classes...' and perhaps 'Pairwise Sequence Alignment'. These are also accessible on the Bioconductor web site, e.g., on the pages linked from http://bioconductor.org/help/bioc-views/release/bioc/ The rule of thumb hinted at below -- that an operation seems to be taking longer than it should -- probably indicates that the function is being invoked in an inefficient way. If the documentation is opaque then definitely the place to seek additional help is on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ Hope this helps. Martin Formal class 'XStringViews' [package Biostrings] with 7 slots ..@ subject:Formal class 'BString' [package Biostrings] with 6 slots .. .. ..@ shared :Formal class 'SharedRaw' [package IRanges] with 2 slots .. .. .. .. ..@ xp:externalptr .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8 .. .. ..@ offset : int 0 .. .. ..@ length : int 3 .. .. ..@ elementMetadata: NULL .. .. ..@ elementType: chr ANY .. .. ..@ metadata : list() ..@ start : int(0) ..@ width : int(0) ..@ NAMES : NULL ..@ elementMetadata: NULL ..@ elementType: chr integer ..@ metadata : list() Perhaps: length(matchPattern(fut_string, past_string)@start ) == 0 You do need to use BString() on at least the past_string argument and maybe the fut_string as well. The BioConductor Mailing List would have a larger audience with experience using this package, so they should probably be your next avenue for advice. I am just reading the help pages as you should be able to do. The help page help(lowlevel-matching) should probably be reviewed since there may be efficiency issues to consider as mentioned below. When dropped into your function with the BString coercion, it replicated your small example results and did not crash after a long period with your larger example, so I then terminated it and insert a reporter line to monitor progress. With that reporter I got up into the 200's for count_len without error. My laptop CPU was warming up the case and I was getting sleepy so I terminated the process. (I had no way of checking for accuracy, even if I had let it proceed, since you did not offer a correct answer.) By the way, the construct ... grepl(. , .) != 1
Re: [R] Matching long strings ... was Re: Memory management in R
On 10/10/2010 11:00 AM, David Winsemius wrote: On Oct 10, 2010, at 11:35 AM, Martin Morgan wrote: On 10/10/2010 07:11 AM, David Winsemius wrote: On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote: I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? Indeed that is the way I should go for and I have installed the package after some struggling. For me is was a matter of waiting. The only struggle was coming from my inner timer saying it was taking too long. Since biostring is a fairly complex package and I need only a way to check if a certain string A is a subset of string B, do you know the biostring functions to achieve this? I see a lot of methods for biological (DNA, RNA) sequences, and they may not apply to my series (which are definitely not from biology). Cheers It appeared to me that the function matchPattern should replace your grepl invocation that was failing. It returns a more complex structure, so you would need to determine what would be an exact replacement for grepl(...) != 1. Looks like a no-match event resutls in the start and end items being of length 0. str( matchPattern(A, BString(BBB)) ) A couple of things from this thread. To install a Bioconductor package follow directions here http://bioconductor.org/install/index.html#install-bioconductor-packages which leads to source(http://bioconductor.org/biocLite.R;) biocLite(Biostrings) biocLite is just a wrapper around install.packages with appropriate repositories defined. Some Bioconductor packages are relatively mature and make relatively advanced use of S4 classes, so looking at str() is not that helpful -- the way the user is meant to interact with the object is different from the way the object is implemented. So the best bet is to look at the relevant help pages result = matchPattern(A, BString(BBB)) class(result) class?XStringViews The above was the most surprising example for me (not being particularly S4-savvy). Looks like it parses as: `?`(class, XStringViews) similarly ?XStringViews-class Is that an S4 sort of extension for accessing documentation or have I just missed a more general method? I tried looking at the help Index for the methods package. ?? documents type?topic. It is more general, in that package?stats takes one to the 'stats' topic amongst the 'package' doc-type help pages. It relies on package authors choosing appropriate docTypes for their man pages. One S4 paradigm that can be useful is the analog of methods(class=lm), which is showMethods(class=XStringViews, where=package:Biostrings). Martin and the help pages referenced there, or from which XStringViews inherits class(XStringViews) and in particular class?Ranges Rather than accessing the 'start' slot, use start(result). Vignettes are used heavily in Bioconductor packages, and in particular browseVignettes(Biostrings) pops up a page with several relevant vignettes, e.g., 'A short presentation of the basic classes...' and perhaps 'Pairwise Sequence Alignment'. These are also accessible on the Bioconductor web site, e.g., on the pages linked from http://bioconductor.org/help/bioc-views/release/bioc/ The rule of thumb hinted at below -- that an operation seems to be taking longer than it should -- probably indicates that the function is being invoked in an inefficient way. If the documentation is opaque then definitely the place to seek additional help is on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ Hope this helps. Martin Formal class 'XStringViews' [package Biostrings] with 7 slots ..@ subject:Formal class 'BString' [package Biostrings] with 6 slots .. .. ..@ shared :Formal class 'SharedRaw' [package IRanges] with 2 slots .. .. .. .. ..@ xp:externalptr .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8 .. .. ..@ offset : int 0 .. .. ..@ length : int 3 .. .. ..@ elementMetadata: NULL .. .. ..@ elementType: chr ANY .. .. ..@ metadata : list() ..@ start : int(0) ..@ width : int(0) ..@ NAMES : NULL ..@ elementMetadata: NULL ..@ elementType: chr integer ..@ metadata : list() Perhaps: length(matchPattern(fut_string, past_string)@start ) == 0 You do need to use BString() on at least the past_string argument and maybe the fut_string as well. The BioConductor Mailing List would have a larger audience with experience using this package, so they should probably be your next avenue for advice. I am just reading the help pages as you should be able to do. The help page help(lowlevel-matching) should probably be reviewed since there may be efficiency issues to consider as mentioned below. When dropped into your function with the BString coercion, it replicated your
Re: [R] Help reading table rows into lists
To get just the list you wanted, Gabor's solution is more elegant, but here's another using the apply family. First, your data: dat - scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n) I expect dat to be a vector of strings where each string is a line of values separated by tabs, which I think, by looking at your other code, is what you get. sapply(dat, function(x){ tmp-unlist(strsplit(x, '\t', fixed=T)) out - list(tmp[seq_along(tmp)[-1]]) names(out) - tmp[1] out }, USE.NAMES=F) The one difference between the two is that if you have a COG with no pathways (might not be realistic or that big of a deal), this solution will have the COG name in the list with a value of character(0) where Gabor's will omit the COG completely. Again, probably not a big deal. Cheers, Jeff. On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote: Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh)) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Thanks, Alison Here is my latest attempt. # line_num-length(scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n)) COG2Path-vector(list,line_num) COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) # I am getting an error # COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t)) Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : But if I do scan alone I don't get an error # then I suppose it looks like the easiest wasy to name the list variables is using unix to cut the first column out and then read that in. names(COG2Path)-scan(file=/g/bork8/waller/test_col_names.txt,sep=\t,what=character) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hausman test for endogeneity
Dear Arne, this looks promising! Thank you very much. Best, Holger -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-for-endogeneity-tp2969522p2970564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
On Sun, Oct 10, 2010 at 2:59 PM, Jeffrey Spies jsp...@virginia.edu wrote: To get just the list you wanted, Gabor's solution is more elegant, but here's another using the apply family. First, your data: dat - scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n) I expect dat to be a vector of strings where each string is a line of values separated by tabs, which I think, by looking at your other code, is what you get. sapply(dat, function(x){ tmp-unlist(strsplit(x, '\t', fixed=T)) out - list(tmp[seq_along(tmp)[-1]]) names(out) - tmp[1] out }, USE.NAMES=F) The one difference between the two is that if you have a COG with no pathways (might not be realistic or that big of a deal), this solution will have the COG name in the list with a value of character(0) where Gabor's will omit the COG completely. Again, probably not a big deal. If that is important then do it this way: Lines - COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh COG0004 DF - read.table(textConnection(Lines), header = FALSE, fill = TRUE, as.is = TRUE, na.strings = ) library(reshape2) m - melt(DF, 1) lapply(unstack(m, value ~ V1), complete.cases) acast(m, value ~ V1) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] venneuler (java?) color palette 0 - 1
Dear UseRs and DevelopeRs It would be helpful to see the color palette available in the venneuler() function. The relevant par of ?venneuler states: colors: colors of the circles as values between 0 and 1 -which explains color specification, but from what pallette? Short of trial and error, i'd really appreciate if some one could help me locate a 0 - 1 pallette for this function to aid with color selection. FWIW, i tried the below code and received the displayed error. I failed to turn up any solutions to this error... Any suggestions appreciated, Karl library(venneuler) ve - venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1)) class(ve) [1] VennDiagram ve$colors - c(red, green, blue) plot(ve) Error in col * 360 : non-numeric argument to binary operator -- Karl Brand Department of Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 704 3457 |F +31 (0)10 704 4743 |M +31 (0)642 777 268 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
Thanks Gabor and Jeffrey, and thanks for explaining the differences. I think I'll go with Jeffery's as I think I want entries for COGs with no pathway. Alison On 10-Oct-10, at 8:59 PM, Jeffrey Spies wrote: sapply(dat, function(x){ tmp-unlist(strsplit(x, '\t', fixed=T)) out - list(tmp[seq_along(tmp)[-1]]) names(out) - tmp[1] out }, USE.NAMES=F) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help reading table rows into lists
On Sun, Oct 10, 2010 at 3:29 PM, Alison Waller alison.wal...@embl.de wrote: Thanks Gabor and Jeffrey, and thanks for explaining the differences. I think I'll go with Jeffery's as I think I want entries for COGs with no pathway. My second post does handle that case. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] venneuler (java?) color palette 0 - 1
Hi On 11/10/2010 9:01 a.m., Karl Brand wrote: Dear UseRs and DevelopeRs It would be helpful to see the color palette available in the venneuler() function. The relevant par of ?venneuler states: colors: colors of the circles as values between 0 and 1 -which explains color specification, but from what pallette? Short of trial and error, i'd really appreciate if some one could help me locate a 0 - 1 pallette for this function to aid with color selection. The color spec stored in the VennDiagram object is multiplied by 360 to give the hue component of an hcl() colour specification. For example, 0.5 would mean the colour hcl(0.5*360, 130, 60) Alternatively, you can control the colours when you call plot, for example, ... plot(ve, col=c(red, green, blue)) ... should work. Paul FWIW, i tried the below code and received the displayed error. I failed to turn up any solutions to this error... Any suggestions appreciated, Karl library(venneuler) ve- venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1)) class(ve) [1] VennDiagram ve$colors- c(red, green, blue) plot(ve) Error in col * 360 : non-numeric argument to binary operator -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 p...@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Line Type Specification: lty=onoff but lty=offon?
Hi, Section 'Line Type Specification' in help(par) explains how you can do custom line types. For example: plot(NA, xlim=c(0,1), ylim=c(0,1)); abline(h=1/2, col=blue, lwd=2, lty=88); will draw a dashed line segment where the line is composed of 8 units of on (blue color) and 8 units of off (transparent), then repeated. Now I'd like to draw a second red line overlapping this one, but where the gaps are now red. Technically, I think the following would define that: abline(h=1/2, col=red, lwd=2, lty=0880); that is 0 on, 8 off, 8 on (red color) and 0 off, then repeated. However, zeros are not allowed (actually, why not?) Any suggestions to draw one red and one blue dashed lines that, if overlapping, the the overlapping segments will be blue, red, blue, red, ...? /Henrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Line Type Specification: lty=onoff but lty=offon?
On Oct 10, 2010, at 5:50 PM, Henrik Bengtsson wrote: Hi, Section 'Line Type Specification' in help(par) explains how you can do custom line types. For example: plot(NA, xlim=c(0,1), ylim=c(0,1)); abline(h=1/2, col=blue, lwd=2, lty=88); will draw a dashed line segment where the line is composed of 8 units of on (blue color) and 8 units of off (transparent), then repeated. Now I'd like to draw a second red line overlapping this one, but where the gaps are now red. Technically, I think the following would define that: abline(h=1/2, col=red, lwd=2, lty=0880); that is 0 on, 8 off, 8 on (red color) and 0 off, then repeated. However, zeros are not allowed (actually, why not?) Any suggestions to draw one red and one blue dashed lines that, if overlapping, the the overlapping segments will be blue, red, blue, red, ...? You might look at the code for color.scale.lines in package plotrix. It's not exactly what you asked for but Jim Lemon has figured out out how to change colors of connected segments. -- David. /Henrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Line Type Specification: lty=onoff but lty=offon?
Tena koe Henrik Not exactly what you are requesting but you could draw a solid line and then the dashed line over the top: plot(1:10, panel.first=abline(0, 1, col='red', lwd=2), panel.last=abline(0, 1, col='blue', lty='88', lwd=2)) or plot(1:10) abline(0, 1, col='red', lwd=2) abline(0, 1, col='blue', lty='88', lwd=2)) This may be system or graphics device dependent (I'm using Windows). HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Henrik Bengtsson Sent: Monday, 11 October 2010 10:50 a.m. To: r-help Subject: [R] Line Type Specification: lty=onoff but lty=offon? Hi, Section 'Line Type Specification' in help(par) explains how you can do custom line types. For example: plot(NA, xlim=c(0,1), ylim=c(0,1)); abline(h=1/2, col=blue, lwd=2, lty=88); will draw a dashed line segment where the line is composed of 8 units of on (blue color) and 8 units of off (transparent), then repeated. Now I'd like to draw a second red line overlapping this one, but where the gaps are now red. Technically, I think the following would define that: abline(h=1/2, col=red, lwd=2, lty=0880); that is 0 on, 8 off, 8 on (red color) and 0 off, then repeated. However, zeros are not allowed (actually, why not?) Any suggestions to draw one red and one blue dashed lines that, if overlapping, the the overlapping segments will be blue, red, blue, red, ...? /Henrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CRAN (and crantastic) updates this week
CRAN (and crantastic) updates this week New packages Updated packages BradleyTerry2 (0.9-3), COUNT (1.1.1), DeducerPlugInScaling (0.0-6) This email provided as a service for the R community by http://crantastic.org. Like it? Hate it? Please let us know: crana...@gmail.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Line Type Specification: lty=onoff but lty=offon?
Thanks both, but unfortunately not. Here is a better illustration on what I want to achieve; xs - c(0,1,2,3); ys - c(-1,0,0,1); lty - c(FF11, 1FF1); plot(NA, xlim=c(0,3), ylim=c(-1,1)); lines(xs, ys, col=red, lwd=2, lty=lty[1]); lines(xs, -ys, col=blue, lwd=2, lty=lty[2]); except that I don't want those short 1:s pieces. Ideally I'd like to use: lty - c(FF00, 0FF0); and dashes of any lengths, e.g. lty - c(2200, 0220); /Henrik On Sun, Oct 10, 2010 at 3:42 PM, Peter Alspach peter.alsp...@plantandfood.co.nz wrote: Tena koe Henrik Not exactly what you are requesting but you could draw a solid line and then the dashed line over the top: plot(1:10, panel.first=abline(0, 1, col='red', lwd=2), panel.last=abline(0, 1, col='blue', lty='88', lwd=2)) or plot(1:10) abline(0, 1, col='red', lwd=2) abline(0, 1, col='blue', lty='88', lwd=2)) This may be system or graphics device dependent (I'm using Windows). HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Henrik Bengtsson Sent: Monday, 11 October 2010 10:50 a.m. To: r-help Subject: [R] Line Type Specification: lty=onoff but lty=offon? Hi, Section 'Line Type Specification' in help(par) explains how you can do custom line types. For example: plot(NA, xlim=c(0,1), ylim=c(0,1)); abline(h=1/2, col=blue, lwd=2, lty=88); will draw a dashed line segment where the line is composed of 8 units of on (blue color) and 8 units of off (transparent), then repeated. Now I'd like to draw a second red line overlapping this one, but where the gaps are now red. Technically, I think the following would define that: abline(h=1/2, col=red, lwd=2, lty=0880); that is 0 on, 8 off, 8 on (red color) and 0 off, then repeated. However, zeros are not allowed (actually, why not?) Any suggestions to draw one red and one blue dashed lines that, if overlapping, the the overlapping segments will be blue, red, blue, red, ...? /Henrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Line Type Specification: lty=onoff but lty=offon?
You might have to use segments() to place the line segments precisely. ?segments From: Henrik Bengtsson h...@stat.berkeley.edu To:Peter Alspach peter.alsp...@plantandfood.co.nz CC:r-help r-help@r-project.org Date: 11/Oct/2010 1:23p Subject: Re: [R] Line Type Specification: lty=onoff but lty=offon? Thanks both, but unfortunately not. Here is a better illustration on what I want to achieve; xs - c(0,1,2,3); ys - c(-1,0,0,1); lty - c(FF11, 1FF1); plot(NA, xlim=c(0,3), ylim=c(-1,1)); lines(xs, ys, col=red, lwd=2, lty=lty[1]); lines(xs, -ys, col=blue, lwd=2, lty=lty[2]); except that I don't want those short 1:s pieces. Ideally I'd like to use: lty - c(FF00, 0FF0); and dashes of any lengths, e.g. lty - c(2200, 0220); /Henrik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rearrange command in quantreg package
Dear all, I want to use the rearrange command which is based on Chernozhukov et al paper and is included in the quantreg package. So, I run a quantile regression in which I included dummy variables for state and years in order to estimate the respective fixed effects quantile regression. The problems are the followings: 1. At example that is stated in the help, I don't understand what the income=quantile(income, 0.3) stands for, in the newdata argument of predict. Why use a specific quantile since we estimate the response's quantile prediction as a function of the quantile index? (If I understand correctly). So, if I use the following code the predict command seems to work fine dyear-dummy(ekc$year)[,-1] dstate-dummy(ekc$state)[,-1] dekc-cbind(ekc, dyear, dstate) z.nox-rq(nox~dyear+dstate+pcinc+I(pcinc^2)+I(pcinc^3), tau=-1, data=dekc) zp.nox - predict(z.nox,newdata=list(pcinc=ekc$pcinc, dyear=dummy(ekc$year)[,-1], dstate=dummy(ekc$state)[,-1]), type=stepfun) but when I am going to do the plot plot(zp.nox,do.points = FALSE, xlab = expression(tau), ylab = expression(Q ( tau )), main=Quantile Something) the following error appears Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not have components 'x' and 'y' On the other hand when I using zp.nox - predict(z.nox,newdata=list(pcinc=quantile(ekc$pcinc, 0.3), dyear=dummy(ekc$year)[,-1], dstate=dummy(ekc$state)[,-1]), type=stepfun) the following error appears: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable lengths differ (found for 'pcinc') and again it is not working if I put quantile in the dummies Perhaps I am doing a stupid mistake since I haven't understood why we use this 0.3 quantile in income data(engel) z - rq(foodexp ~ income, tau = -1,data =engel) zp - predict(z,newdata=list(income=quantile(engel$income,.03)),type=stepfun) plot(zp,do.points = FALSE, xlab = expression(tau), ylab = expression(Q ( tau )), main=Engel Food Expenditure Quantiles) plot(rearrange(zp),do.points = FALSE, add=TRUE,col.h=red,col.v=red) legend(.6,300,c(Before Rearrangement,After Rearrangement),lty=1,col=c(black,red)) 2. My initial target was to re-estimate the fitted curves (nox=b1_hat*pcinc+b2_hat*pcinc^2+b3_hat*pcinc^3) without the quantile crossing. Obviously the rearrange command does not do this. Does anybody know how can I re-estimate (if possible) the quantile regressions without the curves crossing for different quantiles? Thanks a lot -- View this message in context: http://r.789695.n4.nabble.com/rearrange-command-in-quantreg-package-tp2970611p2970611.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Number of occurences of a character in a string
New to R ... which is a function to most effectively search the number of occurrences of a character in a string? b - c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k) I want the number of semi-colons ; in b? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Number of occurences of a character in a string
length(gregexpr(;, b)[[1]]) [1] 5 This works as long as the substrings you are searching for don't overlap. Christian On 10/10/2010 11:18 PM, Santosh Srinivas wrote: New to R ... which is a function to most effectively search the number of occurrences of a character in a string? b- c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k) I want the number of semi-colons ; in b? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Christian Raschke Department of Economics and ISDS Research Lab (HSRG) Louisiana State University Patrick Taylor Hall, Rm 2128 Baton Rouge, LA 70803 cras...@lsu.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Number of occurences of a character in a string
Literally: length( gregexpr(;, b)[[1]]) But more generally, in case b has more than one element: sapply(gregexpr(;, b), length) ?gregexpr On Mon, Oct 11, 2010 at 3:18 PM, Santosh Srinivas santosh.srini...@gmail.com wrote: New to R ... which is a function to most effectively search the number of occurrences of a character in a string? b - c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k) I want the number of semi-colons ; in b? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create single vector after looping through multiple data frames with GREP
Hi Simon, The function below should do it or at least get you started... getPlotData - function (datalist, response, times) { qdata - sapply(datalist[times], function(df) { irow - grepl(response, df$Response) df[irow, 2:5] } ) # qdata is a matrix with rows Q1:Q4 and cols for times; # we turn it into a two col matrix with col 1 = time index # and col 2 = value time.index - seq(4 * ncol(qdata)) out - cbind(time.index, as.numeric(qdata)) rownames(out) - paste(time.index, rownames(qdata), sep=.) colnames(out) - c(time, response) out } #Example, get data for times 10:15 where Response contains Economy x - getPlotData(r, Economy, 10:15) Michael On 11 October 2010 03:35, Simon Kiss sjk...@gmail.com wrote: Hello all, I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below. I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming with control structures. The reproducible code below creates a list containing 19 data frames, one each for the Most Important Problem survey data for Canada. What I'd like at this stage is a loop where I can search through all the data frames for rows containing the search term and then bind the rows together in a plotable (sp?) format. At the bottom of the code below, you'll find my first attempt to make use of a search string and to put it into a plotable format. It only partially works. I can only get the numbers for one year, where I'd like to be able to get a string of numbers for several years.But, on the upside, grep appears to do the trick in terms of selecting rows. Can any one suggest a solution? Yours truly, Simon Kiss #This is the reproducible code to set-up all the data frames require(XML) library(XML) #This gets the data from the web and lists them mylist - paste (http://www.queensu.ca/cora/_trends/mip_;, c(1987:2001,2003:2006), .htm, sep=) alltables - lapply(mylist, readHTMLTable) #convert to dataframes r-lapply(alltables, function(x) {as.data.frame(x)} ) #This is just some house-cleaning; structuring all the tables so they are uniform r[[1]][3]-r[[1]][2] r[[1]][2]-c( ) r[[2]][4]-r[[2]][2] r[[2]][5]-r[[2]][3] r[[2]][2:3]-c( ) r[[3]][4:5]-r[[3]][3:4] r[[3]][3]-c( ) #This loop deletes some superfluous columns and rows, turns the first column in to character strings and the data into numeric for (i in 1:19) { n.rows-dim(r[[i]])[1] r[[i]] - r[[i]][15:n.rows-3, 1:5] n.rows-dim(r[[i]])[1] row.names(r[[i]]) -NULL names(r[[i]]) - c(Response, Q1, Q2, Q3, Q4) r[[i]][, 1]-as.character(r[[i]][,1]) #r[[i]][,2:5]-as.numeric(as.character(r[[i]][,2:5])) r[[i]][, 2:5]-lapply(r[[i]][, 2:5], function(x) {as.numeric(as.character(x))}) #n.rows-dim(r[[i]])[1] #r[[i]]-r[[i]][9 } #This code is my first attempt at introducing a search string, getting the rows, binding and plotting; economy-r[[10]][grep('Economy', r[[10]][,1]),] economy_2-r[[11]][grep('Economy', r[[11]][,1]),] test-cbind(economy, economy_2) plot(as.numeric(test), type='l') #here's another attempt I'm trying economy-data.frame for (i in 15:19) { economy[i,] -r[[i]][grep('Economy', r[[i]][,1]), ] } Begin forwarded message: From: Simon Kiss sjk...@gmail.com Date: October 7, 2010 4:59:46 PM EDT To: Simon Kiss simonjk...@yahoo.ca Subject: Fwd: [R] Converting scraped data Begin forwarded message: From: Ethan Brown ethancbr...@gmail.com Date: October 6, 2010 4:22:41 PM GMT-04:00 To: Simon Kiss sjk...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] Converting scraped data Hi Simon, You'll notice the test data.frame has a whole mix of characters in the columns you're interested, including a - for missing values, and that the columns you're interested in are in fact factors. as.numeric(factor) returns the level of the factor, not the value of the level. (See ?levels and ?factor)--that's why it's giving you those irrelevant integers. I always end up using something like this handy code snippet to deal with the situation: unfactor - function(factors) # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor # Transform a factor back into its factor names { return(levels(factors)[factors]) } Then, to get your data to where you want it, I'd do this: require(XML) theurl - http://www.queensu.ca/cora/_trends/mip_2006.htm; tables - readHTMLTable(theurl) n.rows - unlist(lapply(tables, function(t) dim(t)[1])) class(tables) test-data.frame(tables, stringsAsFactors=FALSE) result - test[11:42, 1:5] #Extract the actual data we want names(result) - c(Response, Q1, Q2,Q3,Q4) for(i in 2:5) { # Convert columns to factors result[,i] - as.numeric(unfactor(result[,i])) } result From here you should be
[R] textConnection on List
I'm trying to optimize some code that I have I have a list of delimited text in a list (see below). I want to do a read.table via a text connection so that I can get the delimited values into a table ... Something like ... tmp_MF_Data_F - read.table(textConnection(tmpTxtList), sep=';', quote = '') ... but this fails ... Any idea how to go about it? Thanks for the help. head(tmpTxtList) [[1]] [1] \106270;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS- DIVIDEND;10.3287;10.3287;0.;01-Apr-2008\ [[2]] [1] \106269;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS- GROWTH;10.3287;10.3287;0.;01-Apr-2008\ [[3]] [1] \102767;Birla Sun Life Dynamic Bond Fund-Retail Plan-Growth;12.6832;12.6832;12.6832;01-Apr-2008\ [[4]] [1] \102766;Birla Sun Life Dynamic Bond Fund-Retail Plan-Quarterly Dividend;10.5396;10.5396;10.5396;01-Apr-2008\ [[5]] [1] \102855;Birla Sun Life Fixed Maturity Plan - Annual Series 3-Dividend;9.9830;9.7833;9.9830;01-Apr-2008\ [[6]] [1] \102856;Birla Sun Life Fixed Maturity Plan - Annual Series 3-Growth;12.3964;12.1485;12.3964;01-Apr-2008\ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MATLAB vrs. R
I need to find the area under a trapezoid for a research-related project. I was able to find the area under the trapezoid in MATLAB using the code: function [int] = myquadrature(f,a,b) % user-defined quadrature function % integrate data f from x=a to x=b assuming f is equally spaced over the interval % use type % determine number of data points npts = prod(size(f)); nint = npts -1; %number of intervals if(npts =1) error('need at least two points to integrate') end; % set the grid spacing if(b =a) error('something wrong with the interval, b should be greater than a') else dx = b/real(nint); end; npts = prod(size(f)); % trapezoidal rule % can code in line, hint: sum of f is sum(f) % last value of f is f(end), first value is f(1) % code below int=0; for i=1:(nint) %F(i)=dx*((f(i)+f(i+1))/2); int=int+dx*((f(i)+f(i+1))/2); end %int=sum(F); Then to call myquadrature I did: % example function call test the user-defined myquadrature function % setup some data % velocity profile across a channel % remember to use ? for help, e.g. ?seq x = 0:10:2000; % you can access one element of a list of values using brackets % x(1) is the first x value, x(2), the 2nd, etc. % if you want the last value, a trick is x(end) % the function cos is cosin and mean gives the mean value % pi is 3.1415, or pi % another hint, if you want to multiple two series of numbers together % for example c = a*b where c(1) = a(1)*b(1), c(2) = a(2)*b(2), etc. % you must tell Matlab you want element by element multiplication % e.g.:c = a.*b % note the . % h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1); %bathymetry u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1); %vertically-averaged cross-transect velocity plot(x,-h) % set begin and end points for the integration a = x(1); b = x(end); % call your quadrature function. Hint, the answer should be 3. f=u.*h; val = myquadrature(f,a,b); fprintf('the solution is %f\n',val); This is great, I got the expected answer of 3. NOW THE ISSUE IS, I HAVE NO IDEA HOW THIS CODE TRANSLATES TO R. Here is what I attempted to do, and with error messages, I can tell i'm doing something wrong: myquadrature-function(f,a,b){ npts=length(f) nint=npts-1 if(npts=1) error('need at least two points to integrate') end; if(b=a) error('something wrong with the interval, b should be greater than a') else dx=b/real(nint) end; npts=length(f) _(below this line, I cannot code) int=0 for(i in 1:(npts-1)) sum(f)=((b-a)/(2*length(f)))*(0.5*f[i]+f[i+1]+f[length(f)])} %F(i)=dx*((f(i)+f(i+1))/2); int=int+dx*((f(i)+f(i+1))/2); end %int=sum(F); Thank you and any potential suggestions would be greatly appreciated. Dr. Argese. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] textConnection on List
I think unlist did the trick ... I used tmpData - unlist (tmpTxtList) tmp_MF_Data_F - read.table(textConnection(tempTxt), sep=';', quote = '') write.table(tmp_MF_Data_F,file=MF_Data_F.txt, append=T, sep =|, col.names=F, row.names=F, quote=F) -Original Message- From: Santosh Srinivas [mailto:santosh.srini...@gmail.com] Sent: 11 October 2010 11:05 To: 'r-help' Subject: textConnection on List I'm trying to optimize some code that I have I have a list of delimited text in a list (see below). I want to do a read.table via a text connection so that I can get the delimited values into a table ... Something like ... tmp_MF_Data_F - read.table(textConnection(tmpTxtList), sep=';', quote = '') ... but this fails ... Any idea how to go about it? Thanks for the help. head(tmpTxtList) [[1]] [1] \106270;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS- DIVIDEND;10.3287;10.3287;0.;01-Apr-2008\ [[2]] [1] \106269;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS- GROWTH;10.3287;10.3287;0.;01-Apr-2008\ [[3]] [1] \102767;Birla Sun Life Dynamic Bond Fund-Retail Plan-Growth;12.6832;12.6832;12.6832;01-Apr-2008\ [[4]] [1] \102766;Birla Sun Life Dynamic Bond Fund-Retail Plan-Quarterly Dividend;10.5396;10.5396;10.5396;01-Apr-2008\ [[5]] [1] \102855;Birla Sun Life Fixed Maturity Plan - Annual Series 3-Dividend;9.9830;9.7833;9.9830;01-Apr-2008\ [[6]] [1] \102856;Birla Sun Life Fixed Maturity Plan - Annual Series 3-Growth;12.3964;12.1485;12.3964;01-Apr-2008\ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.